Structural parallel algorithmics

This thesis reviews selected topics from the theory of parallel computation. The research begins with a survey of the proposed models of parallel computation. It examines the characteristics of each model and it discusses its use either for theoretical studies, or for practical applications. Subsequently, it employs common simulation techniques to evaluate the computational power of these models. The simulations establish certain model relations before advancing to a detailed study of the parallel complexity theory, which is the subject of the second part of this thesis. The second part examines classes of feasible highly parallel problems and it investigates the limits of parallelization. It is concerned with the benefits of the parallel solutions and the extent to which they can be applied to all problems. It analyzes the parallel complexity of various well-known tractable problems and it discusses the automatic parallelization of the efficient sequential algorithms. Moreover, it ...

View PDFchevron_right

Special Issue on Modeling of Parallel Computers

Benjamin W. Wah

Journal of Parallel and Distributed Computing, 1991

The many revolutionary changes brought about by the integrated chip-in the form of significant improvements in processing, storage, and communications-have also brought about a host of related problems for designers and users of parallel and distributed systems. These systems develop and proliferate at an amazing momentum, motivating research in the understanding and testing of complex distributed systems. Unfortunately, these relatively expensive systems are being designed, built, used, refined, and rebuilt (at perhaps an avoidable expense) even before we have developed methodology for understanding the underlying principles of their behavior. Though it is not realistic to expect that the current rate of manufacturing can be slowed down to accommodate research in design principles, it behooves us to bring attention to the importance of design methodology and performance understanding of such systems and, in this way, to attempt to influence parallel system design in a positive manner. At the present time, there is considerable debate among various schools of thought on parallel machine architectures, with different schools proposing different architectures and design philosophies. Consider, for example, one such debate involving tightly coupled systems. Early on, Minsky [l] conjectured a somewhat pessimistic bound of log y1 for typical speedup on n processors. Since then, researchers [2 ] have shown that certain characteristics of programs, such as the DO loops in Fortran, can often be exploited to yield more optimistic levels of speedup. Other researchers [ 31 counter this kind of optimism by pointing out that parallel and vector processing has limitations in potential speedup (i.e., Amdahl's law) to the extent that speedup is bounded from above by n/ (s. n + 1-s), where s is the fraction of a computation that must be done serially. This suggests that it makes more sense to first concentrate on achieving the maximum speedup possible with a single powerful processor. In this view, the distributed approach is not as attractive an option. More recently, work on hypercubes [4] appears to indicate that

View PDFchevron_right

The tao of parallelism in algorithms

Martin Burtscher

2011

For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in "regular" algorithms that use dense arrays, such as finite-differences and FFTs. In this paper, we argue that the dependence graph is not a suitable abstraction for algorithms in new application areas like machine learning and network analysis in which the key data structures are "irregular" data structures like graphs, trees, and sets.

View PDFchevron_right

A Glimpse of Parallel Computing

Hans Langtangen

Texts in Computational Science and Engineering, 2010

View PDFchevron_right

Complexity issues in general purpose parallel computing

Andrew Chin

1991

In recent years, powerful theoretical techniques have been developed for supporting communication, synchronization and fault tolerance in general purpose parallel computing. The proposition of this thesis is that different techniques should be used to support different algorithms. The determining factor is granularity, or the extent to which an algorithm uses long blocks for communication between processors. We consider the Block PRAM model of Aggarwal, Chandra and Snir, a synchronous model of parallel computation in which the processors communicate by accessing a shared memory. In the Block PRAM model, there is a time cost for each access by a processor to a block of locations in the shared memory. This feature of the model encourages the use of long blocks for communication. In the thesis we present Block PRAM algorithms and lower bounds for specific problems on arrays, lists, expression trees, graphs, strings, binary trees and butterflies. These results introduce useful basic techniques for parallel computation in practice, and provide a classification of problems and algorithms according to their granularity. Also presented are optimal algorithms for universal hashing and skewing, which are techniques for supporting conflict-free memory access in general-and special-purpose parallel computations, respectively. We explore the Block PRAM model as a theoretical basis for the design of scalable general purpose parallel computers. Several simulation results are presented which show the Block PRAM model to be comparable to, and competitive with, other models that have been proposed for this role. Two major advantages of machines based on the Block PRAM model is that they are able to preserve the granularity properties of individual algorithms and can efficiently incorporate a significant degree of fault tolerance. The thesis also discusses methods for the design of algorithms that do not use synchronization. We apply these methods to define fast circuits for several fundamental Boolean functions.

View PDFchevron_right

Generic communication in parallel computation

Fabiana Piccoli

2004

The design of parallel programs requires fancy solutions that are not present in sequential programming. Thus, a designer of parallel applications is concerned with the problem of ensuring the correct behavior of all the processes that the program comprises. There are different solutions to each problem, but the question is to find one, that is general. One possibility is allowing the use of asynchronous groups of processors. We present a general methodology to derive efficient parallel divide and conquer algorithms. Algorithms belonging to this class allow the arbitrary division of the processor subsets, easing the opportunities of the underlying software to divide the network in independent sub networks, minimizing the impact of the traffic in the rest of the network in the predicted cost. This methodology is defined by OTMP model and its expressiveness is exemplified through three divide and conquer programs.

View PDFchevron_right

Communication Issues in the Design and Analysis of Parallel Algorithms

Bernard Lint

IEEE Transactions on Software Engineering, 1981

Lafayette, IN, a position he has held since 1976. His research interests include algorithms for data storage and retrieval, programming languages, and software engineering. Dr. Comer is a member of the Association for Computing Machinery and Sigma Xi.

View PDFchevron_right

Parallel Algorithm Models

Dr. Mrinal Paliwal

View PDFchevron_right