Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008, Parallel Computing
Parallel graph algorithm design is a very well studied topic. Many results have been presented for the PRAM model. However, these algorithms are inherently fine grained and experiments show that PRAM algorithms do often not achieve the expected speedup on real machines because of large message overheads. In this paper, we present coarse grained parallel graph algorithms with small message overheads that solve the following standard graph problems related to graph matching: finding maximum matchings in convex bipartite graphs, and finding maximum weight matchings in trees. To our knowledge, these are the first efficient parallel algorithms for these problems that are designed for standard commercial parallel machines such as off-the-shelf processor clusters.
We design, implement, and evaluate algorithms for computing a matching of maximum cardinality in a bipartite graph on multicore and massively multithreaded computers. As computers with larger numbers of slower cores dominate the commodity processor market, the design of multithreaded algorithms to solve large matching problems becomes a necessity. Recent work on serial algorithms for the matching problem has shown that their performance is sensitive to the order in which the vertices are processed for matching. In a multithreaded environment, imposing a serial order in which vertices are considered for matching would lead to loss of concurrency and performance. But this raises the question: Would parallel matching algorithms on multithreaded machines improve performance over a serial algorithm?
2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012
We design, implement, and evaluate algorithms for computing a matching of maximum cardinality in a bipartite graph on multicore and massively multithreaded computers. As computers with larger numbers of slower cores dominate the commodity processor market, the design of multithreaded algorithms to solve large matching problems becomes a necessity. Recent work on serial algorithms for the matching problem has shown that their performance is sensitive to the order in which the vertices are processed for matching. In a multithreaded environment, imposing a serial order in which vertices are considered for matching would lead to loss of concurrency and performance. But this raises the question: Would parallel matching algorithms on multithreaded machines improve performance over a serial algorithm?
Algorithmica, 2002
In this paper, we present deterministic parallel algorithms for the coarse grained multicomputer (CGM) and bulk-synchronous parallel computer (BSP) models which solve the following well known graph problems: (1) list ranking, (2) Euler tour construction, (3) computing the connected components and spanning forest, (4) lowest common ancestor preprocessing, (5) tree contraction and expression tree evaluation, (6) computing an ear decomposition or open ear decomposition, (7) 2-edge connectivity and biconnectivity (testing and component computation), and (8) cordal graph recognition ( nding a perfect elimination ordering). The algorithms for Problems 1-7 require O(log p) communication rounds and linear sequential work per round. Our results for Problems 1 and 2 hold for arbitrary ratios n p , i.e. they are fully scalable, and for Problems 3-8 it is assumed that n p p , > 0, which is true for all commercially available multiprocessors. We view the algorithms presented as an important step towards the nal goal of O(1) communication rounds. Note that, the number of communication rounds obtained in this paper is independent of n and grows only very slowly with respect to p. Hence, for most practical purposes, the number of communication rounds can be considered as constant. The result for Problem 1 is a considerable improvement over those previously reported. The algorithms for Problems 2-7 are the rst practically relevant deterministic parallel algorithms for these problems to be used for commercially available coarse grained parallel machines. ? Research partially supported by the Natural Sciences and Engineering Research Council of Canada, FAPESP (Brasil), CNPq (Brasil), PROTEM-2-TCPAC (Brasil), the Commission of the European Communities (ESPRIT Long Term Research Project 20244, ALCOM-IT), DFG-SFB 376 \Massive Parallelit at" (Germany), and the R egion Rhône-Alpes (France).
Information Processing Letters, 1986
A parallel O(log 3 [E 1) algorithm for finding a maximal matching in a graph G(V, E) is presented. The model of computation is the CRCW-PRAM, and IVI + IEI processors are used. This algorithm is a substantial improvement upon the two previous algorithms known to us. These algorithms by Karp and Wigderson (1984) and Lev (1980) achieve O(log 4 I E D depth with [E 13/log [ E I and I E [ + I V I processors respectively. The last one though having a better performance than the first, only applies to bipartite graphs.
Information Processing Letters, 2000
The problem of computing a matching of maximum weight in a given edge-weighted graph is not known to be P-hard or in RNC. This paper presents four parallel approximation algorithms for this problem.
Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing, 2019
The Massively Parallel Computation (MPC) model serves as a common abstraction of many modern large-scale parallel computation frameworks and has recently gained a lot of importance, especially in the context of classic graph problems. In this work, we mainly consider maximal matching and maximal independent set problems in the MPC model. These problems are known to admit efficient MPC algorithms if the space available per machine is near-linear in the number n of nodes. This is not only often significantly more than what we can afford, but also allows for easy if not trivial solutions for sparse graphs-which are common in real-world large-scale graphs. We are, therefore, interested in the low-memory MPC model, where the space per machine is restricted to be strongly sublinear, that is, n δ for any constant 0 < δ < 1.
Proceedings of the 24th International Colloquium on Automata Languages and Programming, 1997
In recent years there has been an upsurge of interest in spatial databases. A major issue is how to manipulate efficiently massive amounts of spatial data stored on disk in multidimensional spatial indexes (data structures). Construction of spatial indexes (bulk loading) has been studied intensively in the database community. The continuous arrival of massive amounts of new data makes it important to update existing indexes (bulk updating) efficiently. In this paper we present a simple, yet efficient, technique for performing bulk update and query operations on multidimensional indexes. We present our technique in terms of the so-called R-tree and its variants, as they have emerged as practically efficient indexing methods for spatial data. Our method uses ideas from the buffer tree lazy buffering technique and fully utilizes the available internal memory and the page size of the operating system. We give a theoretical analysis of our technique, showing that it is efficient both in terms of I/O communication, disk storage, and internal computation time. We also present the results of an extensive set of experiments showing that in practice our approach performs better than the previously best known bulk update methods with respect to update time, and that it produces a better quality index in terms of query performance. One important novel feature of our technique is that in most cases it allows us to perform a batch of updates and queries simultaneously. To be able to do so is essential in environments where queries have to be answered even while the index is being updated and reorganized.
Parallel Processing …, 2007
Graph algorithms are becoming increasingly important for solving many problems in scientific computing, data mining and other domains. As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. Unfortunately, the algorithms, software, and hardware that have worked well for developing mainstream parallel scientific applications are not necessarily effective for large-scale graph problems. In this paper we present the interrelationships between graph problems, software, and parallel hardware in the current state of the art and discuss how those issues present inherent challenges in solving large-scale graph problems. The range of these challenges suggests a research agenda for the development of scalable high-performance software for graph problems.
Theoretical Computer Science, 2020
Since bipartite convex graphs emerged from industrial applications, algorithms for this class of graphs have been devised in several research areas such as scheduling, DNA analysis, and constraint programming. A bipartite graph G = (V , W , E) is convex if there exists an ordering of the vertices of W such that, for each v ∈ V , the neighbors of v are consecutive in W. In this work we describe a coarse grained parallel algorithm for the maximum matching problem in a convex bipartite graph. For p processors, the algorithm runs in O ((|V |/p) lg(|V |/p) lg p) time and uses O (lg p) communication rounds. We also address a well-known problem in the area of combinatorial optimization, the Hamiltonian circuit problem, presenting a sequential linear-time algorithm to determine if a convex bipartite graph has a Hamiltonian circuit. We further show how to efficiently implement both algorithms in PRAM and coarse grained parallel models. Experimental tests performed on commercial machines show the algorithms are robust and scalable.
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011
We discuss the design and implementation of new highlyscalable distributed-memory parallel algorithms for two prototypical graph problems, edge-weighted matching and distance-1 vertex coloring. Graph algorithms in general have low concurrency, poor data locality, and high ratio of data access to computation costs, making it challenging to achieve scalability on massively parallel machines. We overcome this challenge by employing a variety of techniques, including speculation and iteration, optimized communication, and randomization. We present preliminary results on weak and strong scalability studies conducted on an IBM Blue Gene/P machine employing up to tens of thousands of processors. The results show that the algorithms hold strong potential for computing at petascale.
There exists a large number of theoretical results concerning parallel algorithms for the graph problems. One of them is an algorithm for the perfect matching problem, which is also the central part of the algorithm for finding a maximum flow in a net. We have attempted at implementing it on a parallel computer with 12 processors (instead of the theoretical O(n 3.5 m) processors). When pursuing this goal we have run into a number of practical problems. The aim of this paper is to discuss them as well as the experimental results of our implementation.
1997
Abstract Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing.
1982
An efficient parallel algorithm to obtain maximum matchings in convex bipartite graphs is developed. This algorithm can be used to obtain efficient parallel algorithms for several scheduling problems. Some examples are: job scheduling with release times and deadlines; scheduling to minimize maximum cost; and preemptive scheduling to minimize maximum completion time.
Algorithmica, 1990
In this thesis we examine three problems in graph theory and propose e cient parallel algorithms for solving them. We also introduce a number of parallel algorithmic techniques.
SIAM Journal on Computing, 1984
In this paper, we present efficient parallel algorithms for the following graph problems: finding the lowest common ancestors for vertex pairs of a directed tree; finding all fundamental cycles, a directed spanning forest, all bridges, all bridge-connected components, all separation vertices, all biconnected components, and testing the biconnectivity of an undirected graph. All these algorithms achieve the O(lg n) time bound, with the first two algorithms using n[n/lg n] processors and the remaining algorithms using n[n/lg n] processors. In all cases, our algorithms are better than the previously known algorithms and in most cases reduce the number of processors used by a factor of n lg n. Moreover, our algorithms are optimal with respect to the time-processor product for dense graphs, with the exception of the first two algorithms. The machine model we use is the PRAM which is a SIMD model allowing simultaneous reads but not simultaneous writes to the same memory location.
Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and implementing efficient parallel algorithms for graph problems on symmetric multiprocessors and chip multiprocessors with a case study of parallel tree and connectivity algorithms. The problems we study represent a wide range of irregular problems that have fast theoretic parallel algorithms but no known efficient parallel implementations that achieve speedup without serious restricting assumptions about the inputs. We believe our techniques will be of practical impact in solving largescale graph problems.
2000
Nous étudions les liens entre, la conception et l'analyse d'algorithmes pour les graphes dans les modèles parallèles à gros grain, et le comportement du code résultant sur les machines parallèles actuelles et les grappes. Nous arrivons à la conclusion que le modèle à gros grain CGM est bien adapté pour concevoir des algorithmes concurrentiels, et qu'il est ainsi maintenant possible de vouloir développer du code parallèle portable, prédictif et efficace pour les problèmes concernant les graphes.
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. While previous studies have shown that parallel graph algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance. The MTA's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism. Since parallel graph algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA. We describe and give a performance model for each architecture. We analyze the performance of the two algorithms and discuss how the features of each architecture affects algorithm development, ease of programming, performance, and scalability.
Dartmouth College, Hanover, NH, 1992
In this thesis we examine three problems in graph theory and propose e cient parallel algorithms for solving them. We also introduce a number of parallel algorithmic techniques.
Database Systems for Advanced Applications, 2015
Subgraph matching is the task of finding all matches of a query graph in a large data graph, which is known as an NP-complete problem. Many algorithms are proposed to solve this problem using CPUs. In recent years, Graphics Processing Units (GPUs) have been adopted to accelerate fundamental graph operations such as breadthfirst search and shortest path, owing to their parallelism and high data throughput. The existing subgraph matching algorithms, however, face challenges in mapping backtracking problems to the GPU architectures. Moreover, the previous GPU-based graph algorithms are not designed to handle intermediate and final outputs. In this paper, we present a simple and GPU-friendly method for subgraph matching, called GpSM, which is designed for massively parallel architectures. We show that GpSM outperforms the state-of-the-art algorithms and efficiently answers subgraph queries on large graphs.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.