Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2001, Microelectronics Journal
Creating this document (i.e., typing this document into MS Word) took approximately 1 staff day by one of the authors. Therefore, while this document is somewhat extensive, we anticipate that the changes to the text will require no more than 0.5 staff days of effort by the publisher, excluding figures. There are some revisions required to several figures, which we anticipate should take an additional 0.5 staff days of effort by the publisher. Therefore, we anticipate that with minimal effort on the part of the publisher, a significantly enhanced version of the text will be available.
Springer eBooks, 1999
The copyright owner's consent does not include copying for general distribution, pro motion, new works, or resale. In these cases, specific wrilten permission must first be obtained from the publisher. Production managed by Alian Abrams; manufacturing supervised by Jeffrey Taub. Camera-ready copy prepared by the IMA.
2008
* Copyright 2007, Uzi Vishkin. These class notes reflect the theorertical part in the Parallel Algorithms course at UMD. The parallel programming part and its computer architecture context within the PRAM-On-Chip Explicit Multi-Threading (XMT) platform is provided through the XMT home page www.umiacs.umd.edu/users/vishkin/XMT and the class home page. Comments are welcome: please write to me using my last name at umd.edu
Procedia Computer …, 2011
Lecture Notes in Computer Science, 2002
The emerging discipline of algorithm engineering has primarily focused on transforming pencil-and-paper sequential algorithms into robust, efficient, well tested, and easily used implementations. As parallel computing becomes ubiquitous, we need to extend algorithm engineering techniques to parallel computation. Such an extension adds significant complications. After a short review of algorithm engineering achievements for sequential computing, we review the various complications caused by parallel computing, present some examples of successful efforts, and give a personal view of possible future research.
Siam Journal on Computing, 1989
AbSlracl. T<:chniqucs for parallt:l divide-and-conquer are presenled. rcsulting in improved parallel :lIgo r jthms for a number of probl<:ms. The problems for which improved algorithms are gi\'cn include 5cg.menL inlcrscctlon detection. trapezoidal decomposition. and planar point location. Efficient parallel algorithms ilre algo given for fraclional cascading. lhree·dimensional maxima. lwo-set dominancc counting. and visibility from a point. All of the algorithms presenled run in O(log n) lime with either a linear or a sub linear number of proccssors in the CREW PRAM model. Ke~' words. parallel algorithms. parallel data structures, divide-and-conquer. computational geometry. fraclional cascading. visibility. planar point locat;on. trapezoidal decomposition. dominance, lnlersection deteCllon AMS(MOS) subject ci:l.ssil1cations. 68E05, 68C05. 68C15 l. Intl'oduction. This paper presents a number of genel'al techniques for parallel divide-and-conquer. These techniques are based on nontrivial generaliza[ions of Cole's recent parallel merge son resulI r13] and enable us [Q achieve improved complexi[y bounds for a large number o( pl'Oblems_ In panicular, our [echniques can be applied
Abstract: This paper briefly discuss the general class of algorithms that can be implemented usingparallel constructions. Common characteristics of these algorithms are also described in orderto provide a generic representation for parallel algorithms. In addition it describes the parallelpattern in terms of image scannings and its relationship within mathematical morphology. Sucha pattern is essential for the development of morphological operators and operations. Examplesof the application of the...
In this report we describe the integration of the parallel direct selected multireference CI code Diesel into GAMESSUK. This has been done in order to introduce a parallel version of this method into GAMESSUK. Most of the time has been spent on updating the C++ code in which Diesel has been written. The scaling of the resulting code is shown to be satisfactorily, but expected to be very good for large calculations. The project has been funded by NCF, project number NRG2004.06.
Investigations of several subproblems in the area of derivation of parallel programs were continued during the current quarter. These investigations include: 1. Derivation of various parallel algorithms, parallel graph connectivity and parallel list ranking (with student, Doreen Yen), 2. Automatic Parallel Compilation from segmented straight line programs (with student, Lorrie Tomek), 3. Derivation of pipelined algorithms on small networks (with student, Steve Tate), 4. Programming Languages: Common Prototype Language (CPL), developed by student Lars Nyland and Pro;,ssors Jan Prins, Robert Wagner and John Reif. CPL uses UNITY Primitives.
Republic. A revised version, which with appear hr a volume published by-the IEEE Computer Society Press, appears as Technical Report Number C.Sc. 93-17. available via anonymous ftp from M cs.umr.edu" .
This book grew out of lecture notes for a course on parallel algorithms that I gave at Drexel University over a period of several years. I was frustrated by the lack of texts that had the focus that I wanted. Although the book also addresses some architectural issues, the main focus is on the development of parallel algorithms on "massively parallel" computers. This book could be used in several versions of a course on Parallel Algorithms. We tend to focus on SIMD parallel algorithms in several general areas of application:
2006
We present a new parallel computation model called the Parallel Resource-Optimal computation model. PRO is a framework being proposed to enable the design of efficient and scalable parallel algorithms in an architecture-independent manner, and to simplify the analysis of such algorithms. A focus on three key features distinguishes PRO from existing parallel computation models. First, the design and analysis of a parallel algorithm in the PRO model is performed relative to the time and space complexity of a sequential reference algorithm. Second, a PRO algorithm is required to be both time-and space-optimal relative to the reference sequential algorithm. Third, the quality of a PRO algorithm is measured by the maximum number of processors that can be employed while optimality is maintained. Inspired by the Bulk Synchronous Parallel model, an algorithm in the PRO model is organized as a sequence of supersteps. Each superstep consists of distinct computation and communication phases, but the supersteps are not required to be separated by synchronization barriers. Both computation and communication costs are accounted for in the runtime analysis of a PRO algorithm. Experimental results on parallel algorithms designed using the PRO model-and implemented using its accompanying programming environment SSCRAP-demonstrate that the model indeed delivers efficient and scalable implementations on a wide range of platforms.
Undergraduate Topics in Computer Science, 2018
Undergraduate Topics in Computer Science (UTiCS) delivers high-quality instructional content for undergraduates studying in all areas of computing and information science. From core foundational and theoretical material to final-year topics and applications, UTiCS books take a fresh, concise, and modern approach and are ideal for self-study or for a one-or two-semester course. The texts are all authored by established experts in their fields, reviewed by an international advisory board, and contain numerous examples and problems. Many include fully worked solutions.
Priority queues are used in many applications including real-time systems, operating systems, and simulations. Their implementation may have a profound effect on the performance of such applications. In this article, we study the performance of well-known sequential priority queue implementations and the recently proposed parallel access priority queues. To accurately assess the performance of a priority queue, the performance measurement methodology must be appropriate. We use the Classic Hold, the Markov Model, and an Up/Down access pattern to measure performance and look at both the average access time and the worst-case time that are of vital interest to real-time applications. Our results suggest that the best choice for priority queue algorithms depends heavily on the application. For queue sizes smaller than 1,000 elements, the Splay Tree, the Skew Heap, and Henriksen's algorithm show good average access times. For large queue sizes of 5,000 elements or more, the Calendar Queue and the Lazy Queue offer good average access times but have very long worst-case access times. The Skew Heap and the Splay Tree exhibit the best worst-case access times. Among the parallel access priority queues tested, the Parallel Access Skew Heap provides the best performance on small shared memory multiprocessors.
2000
Large-scale parallel computations are more common than ever, due to the increasing availability of multi-processor systems. However, writing parallel software is often a complicated and error-prone task. To relieve Diffpack users of the tedious and low-level technical details of parallel programming, we have designed a set of new software modules, tools, and programming rules, which will be the topic of
2016
In this paper, we provide a qualitative and quantitative analysis of the performance of parallel algorithms on modern multi-core hardware. We attempt to show a comparative study of the performances of algorithms (traditionally perceived as sequential in nature) in a parallel environment, using the Message Passing Interface (MPI) based on Amdahl’s Law. First, we study sorting algorithms. Sorting is a fundamental problem in computer science, and one where there is a limit on the efficiency of algorithms that exist. In theory it contains a large amount of parallelism and should not be difficult to accelerate sorting of very large datasets on modern architectures. Unfortunately, most serial sorting algorithms do not lend themselves to easy parallelization, especially in a distributed memory system such as we might use with MPI. While initial results show a promising speedup for sorting algorithms, owing to inter-process communication latency, we see an slower run-time, overall with incr...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.