Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2004
The design of parallel programs requires fancy solutions that are not present in sequential programming. Thus, a designer of parallel applications is concerned with the problem of ensuring the correct behavior of all the processes that the program comprises. There are different solutions to each problem, but the question is to find one, that is general. One possibility is allowing the use of asynchronous groups of processors. We present a general methodology to derive efficient parallel divide and conquer algorithms. Algorithms belonging to this class allow the arbitrary division of the processor subsets, easing the opportunities of the underlying software to divide the network in independent sub networks, minimizing the impact of the traffic in the rest of the network in the predicted cost. This methodology is defined by OTMP model and its expressiveness is exemplified through three divide and conquer programs.
International Conference on Parallel Processing, 1990
The Computer Journal, 2001
This paper presents a general data-parallel formulation for a class of problems based on the divide and conquer strategy. A combination of three techniques-mapping vectors, index-digit permutations and space-filling curves-are used to reorganize the algorithmic dataflow, providing great flexibility to efficiently exploit data locality and to reduce and optimize communications. In addition, these techniques allow the easy translation of the reorganized dataflows into HPF (High Performance Fortran) constructs. Finally, experimental results on the Cray T3E validate our method.
Mathematics
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
1995
Abstract This paper studies the runtime behaviour of various parallel divideand-conquer algorithms written in a non-strict functional language, when three common granularity control mechanisms are used: a simple cut-o, a priority thread creation and a priority scheduling mechanism. These mechanisms use granularity information that is currently provided via annotations to improve the performance of the parallel programs.
Siam Journal on Computing, 1989
AbSlracl. T<:chniqucs for parallt:l divide-and-conquer are presenled. rcsulting in improved parallel :lIgo r jthms for a number of probl<:ms. The problems for which improved algorithms are gi\'cn include 5cg.menL inlcrscctlon detection. trapezoidal decomposition. and planar point location. Efficient parallel algorithms ilre algo given for fraclional cascading. lhree·dimensional maxima. lwo-set dominancc counting. and visibility from a point. All of the algorithms presenled run in O(log n) lime with either a linear or a sub linear number of proccssors in the CREW PRAM model. Ke~' words. parallel algorithms. parallel data structures, divide-and-conquer. computational geometry. fraclional cascading. visibility. planar point locat;on. trapezoidal decomposition. dominance, lnlersection deteCllon AMS(MOS) subject ci:l.ssil1cations. 68E05, 68C05. 68C15 l. Intl'oduction. This paper presents a number of genel'al techniques for parallel divide-and-conquer. These techniques are based on nontrivial generaliza[ions of Cole's recent parallel merge son resulI r13] and enable us [Q achieve improved complexi[y bounds for a large number o( pl'Oblems_ In panicular, our [echniques can be applied
2010
The divide-and-conquer pattern of parallelism is a powerful approach to organize parallelism on problems that are expressed naturally in a recursive way. In fact, recent tools such as Intel Threading Building Blocks (TBB), which has received much attention, go further and make extensive usage of this pattern to parallelize problems that other approaches parallelize following other strategies. In this paper we discuss the limitations to express divide-and-conquer parallelism with the algorithm templates provided by the TBB. Based on our observations, we propose a new algorithm template implemented on top of TBB that improves the programmability of many problems that fit this pattern, while providing a similar performance. This is demonstrated with a comparison both in terms of performance and programmability.
Icpp, 1983
Recent developments in integrated circuit technology have suggested a new building block for parallel processing system::;: the single chip computer. This building block makes iL economically feaSible to interconnect large numbers of computers to ferm a muttiImcrocomputer network. Becat:.se the nodes of .men a network do not share any memory, it is Cl'llclUl that a inlerr,unneclion network capable of efficiently supporting message passing be found. We prp.sent a model of Lime varying computation based on task precedence graphs that corresponds closely to the beilav1rIl' of fork/join algorithms such as divide~nd conquer. Using thIS mond, we investigate the behavior f)f t:!ve interconncctiol~ndwod~s under varying "Workloads with distributed scheduling.
Concurrency: Practice and Experience, 1995
In the paper we present a framework for partitioning data parallel computations across a heterogeneous metasystem at runtime. The framework is guided by program and resource information which is made available to the system. Three difficult problems are handled by the framework: processor selection, task placement and heterogeneous data domain decomposition. Solving each of these problems contributes to reduced elapsed time. In particular, processor selection determines the best gain size at which to run the computation, task placement reduces communication cost, and data domain decomposition achieves processor load balance. We present results which indicate that excellent performanceis achievableusing the framework. The paper extends our earlier work on partitioning data parallel computations across a singlelevel network of heterogeneous workstations.
2006
We present a new parallel computation model called the Parallel Resource-Optimal computation model. PRO is a framework being proposed to enable the design of efficient and scalable parallel algorithms in an architecture-independent manner, and to simplify the analysis of such algorithms. A focus on three key features distinguishes PRO from existing parallel computation models. First, the design and analysis of a parallel algorithm in the PRO model is performed relative to the time and space complexity of a sequential reference algorithm. Second, a PRO algorithm is required to be both time-and space-optimal relative to the reference sequential algorithm. Third, the quality of a PRO algorithm is measured by the maximum number of processors that can be employed while optimality is maintained. Inspired by the Bulk Synchronous Parallel model, an algorithm in the PRO model is organized as a sequence of supersteps. Each superstep consists of distinct computation and communication phases, but the supersteps are not required to be separated by synchronization barriers. Both computation and communication costs are accounted for in the runtime analysis of a PRO algorithm. Experimental results on parallel algorithms designed using the PRO model-and implemented using its accompanying programming environment SSCRAP-demonstrate that the model indeed delivers efficient and scalable implementations on a wide range of platforms.
Theoretical Computer Science, 1990
ALnrr& This paper outlines a theory of parallel algorithms that emphasizes two crucial aspects of parallel computation: speedup the improvement in running time due to parallelism. and cficienc,t; the ratio of work done by a parallel algorithm to the work done hv a sequential alponthm. We define six classes of algonthms in these terms: of particular Interest is the &cc. EP, of algorithms that achieve a polynomiai spredup with constant efficiency. The relations hr:ween these classes are examined. WC investigate the robustness of these classes across various models of parallel computation. To do so. w'e examine simulations across models where the simulating machine may be smaller than the simulated machine. These simulations are analyzed with respect to their efficiency and to the reducbon in the number of processors. We show that a large number of parallel computation models are related via efficient simulations. if a polynomial reduction of the number of processors is allowed. This implies that the class EP is invariant across all these models. Many open pmblemc motivated by our app oath are listed. I. IwNtdoetiom As parallel computers become increasingly available, a theory of para!lel algorithms is needed to guide the design of algorithms for such machines. To be useful, such a theory must address two major concerns in parallel computation, namely speedup and efficiency. It should classify algorithms and problems into a few, meaningful classes that are, to the largest exient possible, model independent. This paper outlines an approach to the analysis of parallel algorithms that we feel answers these concerns without sacrificing tc:, much generality or abstractness. We propose a classification of parallel algorithms in terms of parallel running time and inefficiency, which is the extra amount of work done by a parallel algorithm es compared to a sequential algorithm. Both running time and inefficiency are measured as a function of the sequential running time, which is used as a yardstick * A preliminary version of this paper was presented at 15th International Colloquium on Automata,
Parallel Computing, 2000
The BSP model barrier synchronization imposes some limits both in the range of available algorithms and also in their performance. Although BSP programs can be translated to MPI/ PVM programs, the counterpart is not true. The asynchronous nature of some MPI/PVM programs does not easily ®t inside the BSP model. Through the suppression of barriers and the generalization of the concept of superstep we propose two new models, the BSP-like and the BSP without barriers (BSPWB) models. While the BSP-like extends the BSP* model to programs written using collective operations, the more general BSPWB model admits the MPI/ PVM parallel asynchronous programming style. The parameters of the models and their quality are evaluated on four standard parallel platforms: the CRAY T3E, the IBM SP2, the Origin 2000 and the Digital Alpha Server 8400. The study shows that the time spent in an hrelation is more independent on the number of processors than on the communication pattern. We illustrate the use of these BSP extensions through two problem-solving paradigms: the Nested Parallel Recursive Divide and Conquer Paradigm and the Virtual Pipeline Dynamic Programming Paradigm. The proposed paradigms explain how nested parallelism and processor virtualization can be introduced in MPI and PVM without having any negative impact in the performance and model accuracy. The prediction of the communication times is robust even for problems, where communication is dominated by small messages. Ó
Lecture Notes in Computer Science, 2002
The emerging discipline of algorithm engineering has primarily focused on transforming pencil-and-paper sequential algorithms into robust, efficient, well tested, and easily used implementations. As parallel computing becomes ubiquitous, we need to extend algorithm engineering techniques to parallel computation. Such an extension adds significant complications. After a short review of algorithm engineering achievements for sequential computing, we review the various complications caused by parallel computing, present some examples of successful efforts, and give a personal view of possible future research.
2018
The traditional parallel algorithms of data processing based on method: (for some principal the number of cores, a speed of processing etc.) separate all information which should be processed into parts and then process each part separately on a different cores (or processor). It takes quite a long time and is not always an optimal solution to separate into a parts. It's impossible to bring to minimum a standing of cores. It's not always possible to find optimal choice (or changing algorithm during execution). The algorithms provided by us have the main principal that the processing by the cores should be performed in parallel and bring to minimum the stay of cores.In the article are reviewed two algorithms working according to this principal "smart-delay" and the development of multiplication of matrix transposed ribbon-like algorithm.
Now a days we are using very large data base in real life, but it takes more time to process, manipulate and update the data in the data base. So in computer science, Divide and Conquer (D & C) is an important algorithm design paradigm based on multibranched recursion. The divide and conquer algorithm works by recursively breaking down a problem into two or more subproblems of the same (or related) type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem. Already this one widely used technique, now I am trying to implement the Divide and Conquer (D & C) technique in parallel computer using Network.
This book grew out of lecture notes for a course on parallel algorithms that I gave at Drexel University over a period of several years. I was frustrated by the lack of texts that had the focus that I wanted. Although the book also addresses some architectural issues, the main focus is on the development of parallel algorithms on "massively parallel" computers. This book could be used in several versions of a course on Parallel Algorithms. We tend to focus on SIMD parallel algorithms in several general areas of application:
Cluster Computing
Divide-and-conquer is one of the most important patterns of parallelism, being applicable to a large variety of problems. In addition, the most powerful parallel systems available nowadays are computer clusters composed of distributed-memory nodes that contain an increasing number of cores that share a common memory. The optimal exploitation of these systems often requires resorting to a hybrid model that mimics the underlying hardware by combining a distributed and a shared memory parallel programming model. This results in longer development times and increased maintenance costs. In this paper we present a very general skeleton library that allows to parallelize any divide-and-conquer problem in hybrid distributedshared memory systems with little effort while providing much flexibility and good performance. Our proposal combines a message-passing paradigm at the process level and a threaded model inside each process, hiding the related complexity from the user. The evaluation shows that this skeleton provides performance comparable, and often better than that of manually optimized codes while requiring considerably less effort when parallelizing applications on multi-core clusters.
The Journal of Supercomputing, 1988
A formal algebraic model for divide-and-conquer algorithms is presented. The model reveals the internal structure of divide-and-conquer functions, leads to high-level and functional-styled algorithms specification, and simplifies complexity analysis. Algorithms developed under the model contain vast amounts of parallelism and can be mapped fairly easily to parallel computers.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.