Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Republic. A revised version, which with appear hr a volume published by-the IEEE Computer Society Press, appears as Technical Report Number C.Sc. 93-17. available via anonymous ftp from M cs.umr.edu" .
This book grew out of lecture notes for a course on parallel algorithms that I gave at Drexel University over a period of several years. I was frustrated by the lack of texts that had the focus that I wanted. Although the book also addresses some architectural issues, the main focus is on the development of parallel algorithms on "massively parallel" computers. This book could be used in several versions of a course on Parallel Algorithms. We tend to focus on SIMD parallel algorithms in several general areas of application:
Journal of Communications and Networks, 2014
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributedmemory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.
Theoretical Computer Science, 1990
ALnrr& This paper outlines a theory of parallel algorithms that emphasizes two crucial aspects of parallel computation: speedup the improvement in running time due to parallelism. and cficienc,t; the ratio of work done by a parallel algorithm to the work done hv a sequential alponthm. We define six classes of algonthms in these terms: of particular Interest is the &cc. EP, of algorithms that achieve a polynomiai spredup with constant efficiency. The relations hr:ween these classes are examined. WC investigate the robustness of these classes across various models of parallel computation. To do so. w'e examine simulations across models where the simulating machine may be smaller than the simulated machine. These simulations are analyzed with respect to their efficiency and to the reducbon in the number of processors. We show that a large number of parallel computation models are related via efficient simulations. if a polynomial reduction of the number of processors is allowed. This implies that the class EP is invariant across all these models. Many open pmblemc motivated by our app oath are listed. I. IwNtdoetiom As parallel computers become increasingly available, a theory of para!lel algorithms is needed to guide the design of algorithms for such machines. To be useful, such a theory must address two major concerns in parallel computation, namely speedup and efficiency. It should classify algorithms and problems into a few, meaningful classes that are, to the largest exient possible, model independent. This paper outlines an approach to the analysis of parallel algorithms that we feel answers these concerns without sacrificing tc:, much generality or abstractness. We propose a classification of parallel algorithms in terms of parallel running time and inefficiency, which is the extra amount of work done by a parallel algorithm es compared to a sequential algorithm. Both running time and inefficiency are measured as a function of the sequential running time, which is used as a yardstick * A preliminary version of this paper was presented at 15th International Colloquium on Automata,
2018
The traditional parallel algorithms of data processing based on method: (for some principal the number of cores, a speed of processing etc.) separate all information which should be processed into parts and then process each part separately on a different cores (or processor). It takes quite a long time and is not always an optimal solution to separate into a parts. It's impossible to bring to minimum a standing of cores. It's not always possible to find optimal choice (or changing algorithm during execution). The algorithms provided by us have the main principal that the processing by the cores should be performed in parallel and bring to minimum the stay of cores.In the article are reviewed two algorithms working according to this principal "smart-delay" and the development of multiplication of matrix transposed ribbon-like algorithm.
Journal of Parallel and Distributed Computing, 1993
There are several metrics that characterize the performance of a parallel system, such as, parallel execution time, speedup and e ciency. A number of properties of these metrics have been studied. For example, it is a well known fact that given a parallel architecture and a problem of a xed size, the speedup of a parallel algorithm does not continue to increase with increasing number of processors. It usually tends to saturate or peak at a certain limit. Thus it may not be useful to employ more than an optimal number of processors for solving a problem on a parallel computer. This optimal number of processors depends on the problem size, the parallel algorithm and the parallel architecture. In this paper we study the impact of parallel processing overheads and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimizing the parallel execution time. We then study a more general criterion of optimality and show how operating at the optimal point is equivalent to operating at a unique value of e ciency which is characteristic of the criterion of optimality and the properties of the parallel system under study. We put the technical results derived in this paper in perspective with similar results that have appeared in the literature before and show how this paper generalizes and/or extends these earlier results.
2016
In this paper, we provide a qualitative and quantitative analysis of the performance of parallel algorithms on modern multi-core hardware. We attempt to show a comparative study of the performances of algorithms (traditionally perceived as sequential in nature) in a parallel environment, using the Message Passing Interface (MPI) based on Amdahl’s Law. First, we study sorting algorithms. Sorting is a fundamental problem in computer science, and one where there is a limit on the efficiency of algorithms that exist. In theory it contains a large amount of parallelism and should not be difficult to accelerate sorting of very large datasets on modern architectures. Unfortunately, most serial sorting algorithms do not lend themselves to easy parallelization, especially in a distributed memory system such as we might use with MPI. While initial results show a promising speedup for sorting algorithms, owing to inter-process communication latency, we see an slower run-time, overall with incr...
Undergraduate Topics in Computer Science, 2018
Undergraduate Topics in Computer Science (UTiCS) delivers high-quality instructional content for undergraduates studying in all areas of computing and information science. From core foundational and theoretical material to final-year topics and applications, UTiCS books take a fresh, concise, and modern approach and are ideal for self-study or for a one-or two-semester course. The texts are all authored by established experts in their fields, reviewed by an international advisory board, and contain numerous examples and problems. Many include fully worked solutions.
2006
We present a new parallel computation model called the Parallel Resource-Optimal computation model. PRO is a framework being proposed to enable the design of efficient and scalable parallel algorithms in an architecture-independent manner, and to simplify the analysis of such algorithms. A focus on three key features distinguishes PRO from existing parallel computation models. First, the design and analysis of a parallel algorithm in the PRO model is performed relative to the time and space complexity of a sequential reference algorithm. Second, a PRO algorithm is required to be both time-and space-optimal relative to the reference sequential algorithm. Third, the quality of a PRO algorithm is measured by the maximum number of processors that can be employed while optimality is maintained. Inspired by the Bulk Synchronous Parallel model, an algorithm in the PRO model is organized as a sequence of supersteps. Each superstep consists of distinct computation and communication phases, but the supersteps are not required to be separated by synchronization barriers. Both computation and communication costs are accounted for in the runtime analysis of a PRO algorithm. Experimental results on parallel algorithms designed using the PRO model-and implemented using its accompanying programming environment SSCRAP-demonstrate that the model indeed delivers efficient and scalable implementations on a wide range of platforms.
Evaluating how well a whole system or set of subsystems performs is one of the primary objectives of performance testing. We can tell via performance assessment if the architecture implementation meets the design objectives. Performance evaluations of several parallel algorithms are compared in this study. Both theoretical and experimental methods are used in performance assessment as a subdiscipline in computer science. The parallel method outperforms its sequential counterpart in terms of throughput. The parallel algorithm's performance (speedup) is examined, as shown in the result.
Siam Journal on Computing, 1989
AbSlracl. T<:chniqucs for parallt:l divide-and-conquer are presenled. rcsulting in improved parallel :lIgo r jthms for a number of probl<:ms. The problems for which improved algorithms are gi\'cn include 5cg.menL inlcrscctlon detection. trapezoidal decomposition. and planar point location. Efficient parallel algorithms ilre algo given for fraclional cascading. lhree·dimensional maxima. lwo-set dominancc counting. and visibility from a point. All of the algorithms presenled run in O(log n) lime with either a linear or a sub linear number of proccssors in the CREW PRAM model. Ke~' words. parallel algorithms. parallel data structures, divide-and-conquer. computational geometry. fraclional cascading. visibility. planar point locat;on. trapezoidal decomposition. dominance, lnlersection deteCllon AMS(MOS) subject ci:l.ssil1cations. 68E05, 68C05. 68C15 l. Intl'oduction. This paper presents a number of genel'al techniques for parallel divide-and-conquer. These techniques are based on nontrivial generaliza[ions of Cole's recent parallel merge son resulI r13] and enable us [Q achieve improved complexi[y bounds for a large number o( pl'Oblems_ In panicular, our [echniques can be applied
Lecture Notes in Computer Science, 2012
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
IEEE Transactions on Software Engineering, 1981
Lafayette, IN, a position he has held since 1976. His research interests include algorithms for data storage and retrieval, programming languages, and software engineering. Dr. Comer is a member of the Association for Computing Machinery and Sigma Xi.
1997
This book is a must for anyone i n terested in entering t he fascinating n ew world of parallel optimization using parallel processors | computers capable of doing an enormous number of complex operations in a nanosecond. The a uthors are among t he pioneers of this fascinating n ew world and they tell us what n ew applications they explored, what algorithms appear to w ork best, how parallel processors di er in their design, and w h at t he comparative r e s u l ts w ere using di erent t ypes of algorithms on di erent types of parallel processors to solve t hem. According t o an old adage, the w h ole can sometimes be much more than the s u m o f i t s parts. I am thoroughly in agreement with t he a uthors' belief in the a d d ed value of bringing t ogether Applications, Mathematical Algorithms and Parallel Computing t echniques. This is exactly what t hey found true in their own research a n d report on in the book. Many y ears ago, I, too, experienced the t hrill of combining t hree diverse disciplines: the Application (in my case Linear Programs), the Solution Algorithm (the Simplex Method), and t he t hen New Tool (the Serial Computer). The u nion of the t hree made possible the o ptimization of many real-world problems. Parallel processors are the n ew generation and t hey have t he p o wer to t ackle applications which require solution in real time, or have m o d el parameters which are not known with certainty, o r h ave a v ast numb e r o f v ariables and constraints. Image restoration tomography, radiation therapy, n ance, industrial planning, transportation and economics are the sources for many o f t he interesting practical problems used by t he a uthors to t est the m ethodology.
1995
Description/Abstract The best enterprises have both a compelling need pulling them forward and an innovative technological solution pushing them on. In high-performance computing, we have the need for increased computational power in many applications and the inevitable long-term solution is massive parallelism. In the short term, the relation between pull and push may seem unclear as novel algorithms and software are needed to support parallel computing.
Position Papers of the 2016 Federated Conference on Computer Science and Information Systems, 2016
Verifying the correctness of parallel algorithms is not trivial, and it is usually omitted in the works from the parallel computation field. In this paper, we discuss in detail how to show that a certain parallel algorithm is correct. This process involves proving its safety and liveness. We perform the in-depth analysis of our parallel guided ejection search (P-GES) for the pickup and delivery problem with time windows, which serves as an excellent case study. P-GES was implemented as a distributed algorithm using the Message Passing Interface library with asynchronous communications, and was validated using the well-known Li and Lim's benchmark containing demanding test instances. We already proved the efficacy of this algorithm and showed that it can retrieve very high-quality (quite often better than the world's best at that time) routing schedules.
Lecture Notes in Computer Science, 2002
The emerging discipline of algorithm engineering has primarily focused on transforming pencil-and-paper sequential algorithms into robust, efficient, well tested, and easily used implementations. As parallel computing becomes ubiquitous, we need to extend algorithm engineering techniques to parallel computation. Such an extension adds significant complications. After a short review of algorithm engineering achievements for sequential computing, we review the various complications caused by parallel computing, present some examples of successful efforts, and give a personal view of possible future research.
Distributed and Parallel Databases - DPD, 1997
A parallel scheme using the divide-and-conquer method is developed. This partitions the input set of a problem into subsets, computes a partial result from each subset, and finally employs a merging function to obtain the final answer. Based on a linear recursive program as a tool for formalism, a precise characterization for problems to be parallelized by the divide-and-conquer method is obtained. The performance of the parallel scheme is analyzed, and a necessary and sufficient condition to achieve linear speedup is obtained. The parallel scheme is generalized to include parameters, and a real application, the fuzzy join problem, is discussed in detail using the generalized scheme.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.