Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2019 IEEE International Symposium on Workload Characterization (IISWC)
…
14 pages
1 file
Graph analytics power a range of applications in areas as diverse as finance, networking, and business logistics. A common property of graphs used in the domain of graph analytics is a power-law distribution of vertex connectivity, wherein a small number of vertices are responsible for a high fraction of all connections in the graph. These richly-connected (hot) vertices inherently exhibit high reuse. However, their sparse distribution in memory leads to a severe underutilization of on-chip cache capacity. Prior works have proposed lightweight skew-aware vertex reordering that places hot vertices adjacent to each other in memory, reducing the cache footprint of hot vertices and thus improving cache efficiency. However, in doing so, they may inadvertently destroy the inherent community structure within the graph, which may negate the performance gains achieved from the reduced footprint of hot vertices. In this work, we study existing reordering techniques and demonstrate the inherent tension between reducing the cache footprint of hot vertices and preserving original graph structure. We quantify the potential performance loss due to disruption in graph structure for different graph datasets. We further show that reordering techniques that employ fine-grain reordering significantly increase misses in the higher level caches, even when they reduce misses in the last level cache. To overcome the limitations of existing reordering techniques, we propose Degree-Based Grouping (DBG), a novel lightweight reordering technique that employs a coarse-grain reordering to largely preserve graph structure while reducing the cache footprint of hot vertices. Our evaluation on 40 combinations of various graph applications and datasets shows that, compared to a baseline with no reordering, DBG yields an average application speed-up of 16.8% vs 11.6% for the best-performing existing lightweight technique.
2015
We have made experiments with reordering algorithms on sparse very large graphs (VLGs), we have considered only undirected, unweighted, sparse and huge graphs, i.e. G = (V;E) with n = jV j from million to billion of nodes and with jEj = O(jV j). The problem of reordering a matrix to enhance the computation time (and sometimes the memory) is traditional in numerical algorithms but we focus on this short paper on results obtained for the approximate computation of the diameter of a sparse VLG (with some graphs on various different computers). The problem of reordering a graph has already been pointed, explicitly or implicitly by a lot of people, from the numerical community but also from the graph community, like the authors of the Louvain algorithm when they write that choosing an order is thus worth studying since it could accelerate the Louvain algorithm. Our experimental results show clearly that it can be worth (and simple) to preprocess a sparse VLG with a reordering algorithm.
Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées, 2022
Graph algorithms have inherent characteristics, including data-driven computations and poor locality. These characteristics expose graph algorithms to several challenges, because most well studied (parallel) abstractions and implementation are not suitable for them. In our previous work [21, 22, 24], we show how to use some complex-network properties, including community structure and heterogeneity of node degree, to improve performance, by a proper memory management (Cn-order) and an appropriate thread scheduling (comm-deg-scheduling). In recent work [23], Besta et al. proposed log(graph), a graph representation that outperforms existing graph compression algorithms. In this paper, we show that our graph numbering heuristic and our scheduling heuristics can be improved when they are combined with log(graph) data structure. Experiments were made on multi-core machines. For example, on one node of a multi-core machine (Troll from Grid'5000), we showed that when combining our prev...
Automation and Remote Control, 2018
We propose an algorithm, linear in both running time and memory, that constructs a sequence of operations that transform any given directed graph with degree of any vertex at most two to any other given graph of the same type with minimal total cost. This sequence is called the shortest one. We allow four standard operations of re-gluing graphs with equal cost and two more additional operations of inserting and deleting a connected section of edges that also have equal cost. We prove that the algorithm finds a minimum with this restriction on the costs.
2011 Proceedings of the Thirteenth Workshop on Algorithm Engineering and Experiments (ALENEX), 2011
We present an I/O-efficient algorithm for topologically sorting directed acyclic graphs (DAGs). No provably I/O-efficient algorithm for this problem is known. Similarly, the performance of our algorithm, which we call IterTS, may be poor in the worst case. However, our experiments show that IterTS achieves good performance in practise. The strategy of IterTS can be summarized as follows. We call an edge satisfied if its tail has a smaller number than its head. A numbering satisfying at least half the edges in the DAG is easy to find: a random numbering is expected to have this property. IterTS starts with such a numbering and then iteratively corrects the numbering to satisfy more and more edges until all edges are satisfied. To evaluate IterTS, we compared its running time to those of three competitors: PeelTS, an I/O-efficient implementation of the standard strategy of iteratively removing sources and sinks; ReachTS, an I/O-efficient implementation of a recent parallel divide-and-conquer algorithm based on reachability queries; and SeTS, standard DFS-based topological sorting built on top of a semi-external DFS algorithm. In our evaluation on various types of input graphs, IterTS consistently outperformed PeelTS and ReachTS, by at least an order of magnitude in most cases. SeTS outperformed IterTS on most graphs whose vertex sets fit in memory. However, IterTS often came close to the running time of SeTS on these inputs and, more importantly, SeTS was not able to process graphs whose vertex sets were beyond the size of main memory, while IterTS was able to process such inputs efficiently.
2017
Complex networks are set of entities in a relationship, modeled by graphs where nodes represent entities and edges between nodes represent relationships. Graph algorithms have inherent characteristics, including data-driven computations and poor locality. These characteristics expose graph algorithms to several challenges; this is because most well studied (parallel) abstractions and implementation are not suitable for them. This work shows how we use some complex-network properties, including community structure and heterogeneity of node degree, to tackle one of the main challenges: improving performance, by improving memory location and by providing proper thread scheduling. In this paper, we firstly formalize complex-network ordering for cache misses reducing as a well known NP-Complete problem, the optimal linear arrangement problem; we then propose cn-order an heuristic that outperforms very recent graph orders. Secondly, we formalize degree-aware scheduling problem as another ...
2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)
Graph analytics power a range of applications in areas as diverse as finance, networking and business logistics. A common property of graphs used in the domain of graph analytics is a power-law distribution of vertex connectivity, wherein a small number of vertices are responsible for a high fraction of all connections in the graph. These richly-connected, hot, vertices inherently exhibit high reuse. However, this work finds that state-of-the-art hardware cache management schemes struggle in capitalizing on their reuse due to highly irregular access patterns of graph analytics. In response, we propose GRASP, domain-specialized cache management at the last-level cache for graph analytics. GRASP augments existing cache policies to maximize reuse of hot vertices by protecting them against cache thrashing, while maintaining sufficient flexibility to capture the reuse of other vertices as needed. GRASP keeps hardware cost negligible by leveraging lightweight software support to pinpoint hot vertices, thus eliding the need for storage-intensive prediction mechanisms employed by state-of-the-art cache management schemes. On a set of diverse graph-analytic applications with large high-skew graph datasets, GRASP outperforms prior domain-agnostic schemes on all datapoints, yielding an average speed-up of 4.2% (max 9.4%) over the best-performing prior scheme. GRASP remains robust on low-/no-skew datasets, whereas prior schemes consistently cause a slowdown.
Proceedings of the LinkKDD workshop at the …, 2004
Graphs form the foundation of many real-world datasets ranging from Internet connectivity to social networks. Yet despite this underlying structure, the size of these datasets presents a nearly insurmountable obstacle to understanding the essential character of the data. We want to understand "what the graph looks like;" we want to know which vertices and edges are important and what are the significant features in the graph. For a communication network, such an understanding entails recognizing the overall design of the network (e.g., hub-and-spoke, mesh, backbone), as well as identifying the "important" nodes and links.
2012 19th International Conference on High Performance Computing, 2012
Scalable analysis of massive graphs has become a challenging issue in high performance computing environments. ScaleGraph is an X10 library aimed for large scale graph analysis scenarios. This paper evaluates scalability of ScaleGraph library for degree distribution calculation, betweeness centrality, and spectral clustering algorithms. We make scalability evaluation by analyzing a synthetic Kronecker graph with 40.3 million edges (for all the three algorithms), and a real social network with 69 million edges (for degree distribution calculation) on Tsubame 2.0 distributed memory environment.
2010
In this paper we talk about speeding up calculation of graph metrics and layout with NodeXL by exploiting the parallel architecture of modern day Graphics Processing Units (GPU), specifically Compute Unified Device Architecture (CUDA) by Nvidia. Graph centrality metrics like Eigenvector, Betweenness, Page Rank and layout algorithms like Fruchterman-Rheingold are essential components of Social Network Analysis (SNA). With the growth in adoption of SNA in different domains and increasing availability of huge networked datasets for analysis, social network analysts are looking for tools that are faster and more scalable. Our results show up to 802 times speedup for a FruchtermanRheingold graph layout and up to 17,972 times speedup for Eigenvector centrality metric calculations. 1. Introduction NodeXL [3] is a network data analysis and visualization [10] plug-in for Microsoft Excel 2007 that provides a powerful and simple means to graph data contained in a spreadsheet. Data may be impor...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2016
PLOS ONE
Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms, 2015
Communications of the ACM, 1973
Proceedings of the 2012 international conference on Management of Data - SIGMOD '12, 2012
Applied Intelligence, 2015
Proceedings of the 21st international conference on Parallel architectures and compilation techniques - PACT '12, 2012
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13, 2013
Proceedings of the VLDB Endowment
2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC), 2020
Artificial Intelligence, 1983
2007 6th International Asia-Pacific Symposium on Visualization, 2007