Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
12 pages
1 file
We describe a system that uses automated planning to synthesize correct and efficient parallel graph programs from high-level algorithmic specifications. Automated planning allows us to use constraints to declaratively encode program transformations such as scheduling, implementation selection, and insertion of synchronization. Each plan emitted by the planner satisfies all constraints simultaneously, and corresponds to a composition of these transformations. In this way, we obtain an integrated compilation approach for a very challenging problem domain. We have used this system to synthesize parallel programs for four graph problems: triangle counting, maximal independent set computation, preflow-push maxflow, and connected components. Experiments on a variety of inputs show that the synthesized implementations perform competitively with handwritten , highly-tuned code.
ACM SIGPLAN Notices, 2012
ABSTRACT Algorithms in new application areas like machine learning and network analysis use "irregular" data structures such as graphs, trees and sets. Writing efficient parallel code in these problem domains is very challenging because it requires the programmer to make many choices: a given problem can usually be solved by several algorithms, each algorithm may have many implementations, and the best choice of algorithm and implementation can depend not only on the characteristics of the parallel platform but also on properties of the input data such as the structure of the graph. One solution is to permit the application programmer to experiment with different algorithms and implementations without writing every variant from scratch. Auto-tuning to find the best variant is a more ambitious solution. These solutions require a system for automatically producing efficient parallel implementations from high-level specifications. Elixir, the system described in this paper, is the first step towards this ambitious goal. Application programmers write specifications that consist of an operator, which describes the computations to be performed, and a schedule for performing these computations. Elixir uses sophisticated inference techniques to produce efficient parallel code from such specifications. We used Elixir to automatically generate many parallel implementations for three irregular problems: breadth-first search, single source shortest path, and betweenness-centrality computation. Our experiments show that the best generated variants can be competitive with handwritten code for these problems from other research groups; for some inputs, they even outperform the handwritten versions.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016
Declarative programming has been hailed as a promising approach to parallel programming since it makes it easier to reason about programs while hiding the implementation details of parallelism from the programmer. However, its advantage is also its disadvantage as it leaves the programmer with no straightforward way to optimize programs for performance. In this paper, we introduce Coordinated Linear Meld (CLM), a concurrent forward-chaining linear logic programming language, with a declarative way to coordinate the execution of parallel programs allowing the programmer to specify arbitrary scheduling and data partitioning policies. Our approach allows the programmer to write graph-based declarative programs and then optionally to use coordination to fine-tune parallel performance. In this paper we specify the set of coordination facts, discuss their implementation in a parallel virtual machine, and show-through example-how they can be used to optimize parallel execution. We compare the execution of CLM programs against the original uncoordinated Linear Meld and several other frameworks.
2010
Computations on unstructured graphs are challenging to parallelize because dependences in the underlying algorithms are usually complex functions of runtime data values, thwarting static parallelization. One promising general-purpose parallelization strategy for these algorithms is optimistic parallelization.
The focus of traditional scientific computing has been in solving large systems of PDEs (and the corresponding linear algebra problems that they induce). Hardware architectures, computer systems, and software platforms have evolved together to efficiently support solving these kinds of problems. Similar attention has not been devoted to solving large-scale graph problems. Recently this class of applications has seen increased attention. The irregular, nonlocal, and dynamic characteristics of these problems require new programming techniques to adapt them to modern HPC systems offering multiple levels of parallelism. We describe a library for implementing graph algorithms based on asynchronous execution of fine-grained, concurrent operations. Prototype implementations of two graph kernels which combine lightweight graph metadata transactions with generalized active messages demonstrate that it is possible to implement graph applications which efficiently leverage both shared-and distributed-memory parallelism.
14th Twente Student conference on IT January 21st, 2011
Despite the wide availability of multi-core processors and the popularity of Java, there is currently no library for parallel graph algorithms in Java available. Such a library would enable all Java programmers, especially those who work on the verification of programs and model checking to efficiently use graph algorithms. To find out what would be a good design for this library, we have made a start on it. We have created a general design for algorithms and graphs in the library. We have also implemented the reachability and connected components algorithms, both sequential and parallel. In this paper we describe the algorithms we have implemented and our design of the library, focusing on the design choices we made.
Proceedings of the 15th ACM SIGPLAN International Conference on Software Language Engineering
Domain-specific language compilers need to close the gap between the domain abstractions of the language and the lowlevel concepts of the target platform. This can be challenging to achieve for compilers targeting multiple platforms with potentially very different computing paradigms. In this paper, we present a multi-target, multi-paradigm DSL compiler for algorithmic graph processing. Our approach centers around an intermediate representation and reusable, composable transformations to be shared between the different compiler targets. These transformations embrace abstractions that align closely with the concepts of a particular target platform, and disallow abstractions that are semantically more distant. We report on our experience implementing the compiler and highlight some of the challenges and requirements for applying language workbenches in industrial use cases. CCS Concepts: • Software and its engineering → Domain specific languages.
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization - CGO '14, 2014
Large-scale graph processing, with its massive data sets, requires distributed processing. However, conventional frameworks for distributed graph processing, such as Pregel, use non-traditional programming models that are well-suited for parallelism and scalability but inconvenient for implementing non-trivial graph algorithms. In this paper, we use Green-Marl, a Domain-Specific Language for graph analysis, to intuitively describe graph algorithms and extend its compiler to generate equivalent Pregel implementations. Using the semantic information captured by Green-Marl, the compiler applies a set of transformation rules that convert imperative graph algorithms into Pregel's programming model. Our experiments show that the Pregel programs generated by the Green-Marl compiler perform similarly to manually coded Pregel implementations of the same algorithms. The compiler is even able to generate a Pregel implementation of a complicated graph algorithm for which a manual Pregel implementation is very challenging.
Graph Transformation, 2016
We show how to generate efficient C code for a high-level domain-specific language for graphs. The experimental language GP 2 is based on graph transformation rules and aims to facilitate formal reasoning on programs. Implementing graph programs is challenging because rule matching is expensive in general. GP 2 addresses this problem by providing rooted rules which under mild conditions can be matched in constant time. Using a search plan, our compiler generates C code for matching rooted graph transformation rules. We present run-time experiments with our implementation in a case study on checking graphs for two-colourability: on grid graphs of up to 100,000 nodes, the compiled GP 2 program is as fast as the tailor-made C program given by Sedgewick.
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. While previous studies have shown that parallel graph algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance. The MTA's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism. Since parallel graph algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA. We describe and give a performance model for each architecture. We analyze the performance of the two algorithms and discuss how the features of each architecture affects algorithm development, ease of programming, performance, and scalability.
Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and implementing efficient parallel algorithms for graph problems on symmetric multiprocessors and chip multiprocessors with a case study of parallel tree and connectivity algorithms. The problems we study represent a wide range of irregular problems that have fast theoretic parallel algorithms but no known efficient parallel implementations that achieve speedup without serious restricting assumptions about the inputs. We believe our techniques will be of practical impact in solving largescale graph problems.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Computer architecture news, 1988
Lecture Notes in Computer Science, 1991
Concurrency: Practice and Experience, 1991
ACM SIGPLAN Notices, 2004
Algorithmica, 1990
Dartmouth College, Hanover, NH, 1992
Parallel Processing …, 2007
2013 International Conference on High Performance Computing & Simulation (HPCS), 2013
Lecture Notes in Computer Science, 1996
Languages and Compilers for Parallel Computing, 2000
Electronic Notes in Theoretical Computer Science, 2007
Euro-Par 2017: Parallel Processing Workshops, 2018
Electronic Proceedings in Theoretical Computer Science, 2010
Parallel Computing, 1991