Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010
This thesis reviews selected topics from the theory of parallel computation. The research begins with a survey of the proposed models of parallel computation. It examines the characteristics of each model and it discusses its use either for theoretical studies, or for practical applications. Subsequently, it employs common simulation techniques to evaluate the computational power of these models. The simulations establish certain model relations before advancing to a detailed study of the parallel complexity theory, which is the subject of the second part of this thesis. The second part examines classes of feasible highly parallel problems and it investigates the limits of parallelization. It is concerned with the benefits of the parallel solutions and the extent to which they can be applied to all problems. It analyzes the parallel complexity of various well-known tractable problems and it discusses the automatic parallelization of the efficient sequential algorithms. Moreover, it ...
Theoretical Computer Science, 1990
ALnrr& This paper outlines a theory of parallel algorithms that emphasizes two crucial aspects of parallel computation: speedup the improvement in running time due to parallelism. and cficienc,t; the ratio of work done by a parallel algorithm to the work done hv a sequential alponthm. We define six classes of algonthms in these terms: of particular Interest is the &cc. EP, of algorithms that achieve a polynomiai spredup with constant efficiency. The relations hr:ween these classes are examined. WC investigate the robustness of these classes across various models of parallel computation. To do so. w'e examine simulations across models where the simulating machine may be smaller than the simulated machine. These simulations are analyzed with respect to their efficiency and to the reducbon in the number of processors. We show that a large number of parallel computation models are related via efficient simulations. if a polynomial reduction of the number of processors is allowed. This implies that the class EP is invariant across all these models. Many open pmblemc motivated by our app oath are listed. I. IwNtdoetiom As parallel computers become increasingly available, a theory of para!lel algorithms is needed to guide the design of algorithms for such machines. To be useful, such a theory must address two major concerns in parallel computation, namely speedup and efficiency. It should classify algorithms and problems into a few, meaningful classes that are, to the largest exient possible, model independent. This paper outlines an approach to the analysis of parallel algorithms that we feel answers these concerns without sacrificing tc:, much generality or abstractness. We propose a classification of parallel algorithms in terms of parallel running time and inefficiency, which is the extra amount of work done by a parallel algorithm es compared to a sequential algorithm. Both running time and inefficiency are measured as a function of the sequential running time, which is used as a yardstick * A preliminary version of this paper was presented at 15th International Colloquium on Automata,
IEEE Symposium on Foundations of Computer Science, 1980
Lecture Notes in Computational Science and Engineering, 2003
Large-scale parallel computations are more common than ever, due to the increasing availability of multi-processor systems. However, writing parallel software is often a complicated and error-prone task. To relieve Diffpack users of the tedious and low-level technical details of parallel programming, we have designed a set of new software modules, tools, and programming rules, which will be the topic of the present chapter.
The next generation computing systems are focusing on parallel computing to solve the problem in fast and high speed using parallel programming concepts. The running time of any complex algorithm or applications can be improved by running more than one task at the same time with multiple processors concurrently. Here the performance improvement is measured in terms of increase in the number of cor es per machine and is analyzed for better Energy Efficient optimal work load balance. In this paper we investigate the review and literature survey of parallel computers for future improvements. We aim to present the theoretical and technical terminologies for parallel computing. The focus here is the process of comparative analysis of single core and multicore systems to run an application program for faster execution time and optimize the scheduler for better performance.
Undergraduate Topics in Computer Science, 2018
Undergraduate Topics in Computer Science (UTiCS) delivers high-quality instructional content for undergraduates studying in all areas of computing and information science. From core foundational and theoretical material to final-year topics and applications, UTiCS books take a fresh, concise, and modern approach and are ideal for self-study or for a one-or two-semester course. The texts are all authored by established experts in their fields, reviewed by an international advisory board, and contain numerous examples and problems. Many include fully worked solutions.
The main goal of this research is to use OpenMP, Posix Threads and Microsoft Parallel Patterns libraries to design an algorithm to compute Matrix Multiplication effectively. By using the libraries of OpenMP, Posix Threads and Microsoft Parallel Patterns Libraries, one can optimize the speedup of the algorithm. First step is to write simple program which calculates a predetermined Matrix and gives the results, after compilation and execution of the code. In this stage only single core processor is used to calculate the Matrix multiplication. Later on, in this research OpenMP, Posix Threads and Microsoft Parallel Patterns libraries are added separately and use some functions in the code to parallelize the computation, by using those functions multi-cores of a processor are allowed. Then execute the program and check its run time, then a timer function is added to the code which periodically checks the time it took for the computer to do the parallelization. First the program is run without the Parallel libraries, and then with the OpenMP, Posix Threads and with Microsoft Parallel Patterns libraries code. Then program is executed for each input Matrix size and result is collected. Maximum 5 trials for each input size are conducted and record the time it took for the computer to parallelize the Matrix multiplication. Finally comparison of the performance in terms of execution time and speed up for OpenMP, Posix Threads and Microsoft Parallel Patterns libraries is done using different Matrix Dimensions and different number of processors.
International Journal of Computer Applications, 2012
In this paper a survey on current trends in parallel computing has been studied that depicts all the aspects of parallel computing system. A large computational problem that can not be solved by a single CPU can be divided into a chunk of small enough subtasks which are processed simultaneously by a parallel computer. The parallel computer consists of parallel computing hardware, parallel computing model, software support for parallel programming. Parallel performance measurement parameters and parallel benchmarks are used to measure the performance of a parallel computing system. The hardware and the software are specially designed for parallel algorithm and programming. This paper explores all the aspects of parallel computing and its usefulness.
21st Annual Symposium on Foundations of Computer Science (sfcs 1980), 1980
2006
We present a new parallel computation model called the Parallel Resource-Optimal computation model. PRO is a framework being proposed to enable the design of efficient and scalable parallel algorithms in an architecture-independent manner, and to simplify the analysis of such algorithms. A focus on three key features distinguishes PRO from existing parallel computation models. First, the design and analysis of a parallel algorithm in the PRO model is performed relative to the time and space complexity of a sequential reference algorithm. Second, a PRO algorithm is required to be both time-and space-optimal relative to the reference sequential algorithm. Third, the quality of a PRO algorithm is measured by the maximum number of processors that can be employed while optimality is maintained. Inspired by the Bulk Synchronous Parallel model, an algorithm in the PRO model is organized as a sequence of supersteps. Each superstep consists of distinct computation and communication phases, but the supersteps are not required to be separated by synchronization barriers. Both computation and communication costs are accounted for in the runtime analysis of a PRO algorithm. Experimental results on parallel algorithms designed using the PRO model-and implemented using its accompanying programming environment SSCRAP-demonstrate that the model indeed delivers efficient and scalable implementations on a wide range of platforms.
This book grew out of lecture notes for a course on parallel algorithms that I gave at Drexel University over a period of several years. I was frustrated by the lack of texts that had the focus that I wanted. Although the book also addresses some architectural issues, the main focus is on the development of parallel algorithms on "massively parallel" computers. This book could be used in several versions of a course on Parallel Algorithms. We tend to focus on SIMD parallel algorithms in several general areas of application:
Journal of Parallel and Distributed Computing, 1993
There are several metrics that characterize the performance of a parallel system, such as, parallel execution time, speedup and e ciency. A number of properties of these metrics have been studied. For example, it is a well known fact that given a parallel architecture and a problem of a xed size, the speedup of a parallel algorithm does not continue to increase with increasing number of processors. It usually tends to saturate or peak at a certain limit. Thus it may not be useful to employ more than an optimal number of processors for solving a problem on a parallel computer. This optimal number of processors depends on the problem size, the parallel algorithm and the parallel architecture. In this paper we study the impact of parallel processing overheads and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimizing the parallel execution time. We then study a more general criterion of optimality and show how operating at the optimal point is equivalent to operating at a unique value of e ciency which is characteristic of the criterion of optimality and the properties of the parallel system under study. We put the technical results derived in this paper in perspective with similar results that have appeared in the literature before and show how this paper generalizes and/or extends these earlier results.
Parallel computing has become an essential subject in the field of computer science and also it is shown to be critical when researching in high end solutions. The evolution of computer architectures (multicore and manycore) towards an increased quantity of cores, where parallelism could be the approach to option for speeding up an algorithm within the last few decades, the graphics processing unit, GPU and CPU, has gained an essential place in the area of high end computing (HPC) due to its low priced and massive processing power that is parallel. In this paper, we survey the idea of parallel computing, especially CPU computing and its programming models and also gives a couple of theoretical and technical concepts which can be often needed to understand the CPU and GPU as well as its parallelism in massive model. In particular, we show how this technology is new in assisting the field of computational physics, especially when the issue is data parallel.
Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications, 2002
We present a new parallel computation model that enables the design of resource-optimal scalable parallel algorithms and simplifies their analysis. The model rests on the novel idea of incorporating relative optimality as an integral part and measuring the quality of a parallel algorithm in terms of granularity.
Recent advances in computer systems have been making the conventional computational complexity theories inappropriate in various situations. The random access memory (RAM) model was made unrealistic by the large speed gap between the processing units and the main memory systems. Distributed computing environments have obsoleted most of the parallel computation models proposed so far due to nonnegligible diversity i n c o m m unication delay. In this paper, we propose a new framework for computational complexity, named access complexity, in which the cost is assumed to lie in data transfer rather than computation itself. The model tries to capture all levels of system hierarchy, f r o m cache systems to globally distributed environments. The abstract machine provided by the model has a uniform memory eld and a uniform communication layer, which a r e meant to capture a variety of parallel computation environments. Diverse access costs in the actual computing systems are modeled as dierent latencies in the communication layer in a uniform way with a simple congestion model. With the model, some parallel algorithms, bitonic sort, merge sort, and FFT, namely, are analyzed, showing that the model facilitates performance analysis. We a l s o show the appropriateness of the model through comparing the predicted and the measured performances of these algorithms.
2011 International Conference on Parallel Processing, 2011
For a given algorithm, the energy consumed in executing the algorithm has a nonlinear relationship with performance. In case of parallel algorithms, energy use and performance are functions of the structure of the algorithm. We define the asymptotic energy complexity of algorithms which models the minimum energy required to execute a parallel algorithm for a given execution time as a function of input size. Our methodology provides us with a way of comparing the orders of (minimal) energy required for different algorithms and can be used to define energy complexity classes of parallel algorithms.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.