Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2013, International Journal of Parallel Programming
This special issue provides a forum for presenting the latest research on algorithms and applications for parallel and distributed systems, including algorithm design and optimization, programming paradigms, algorithm design and programming techniques heterogeneous computing systems, tools and environment for parallel/distributed software development, petascale and exascale algorithms, novel parallel and distributed applications, and performance simulations, measurement, and evaluations. The success of parallel algorithms-even on problems that at first glance seem inherently serialsuggests that this style of programming will be the inherent to any application in a near future. The relevant research has gained momentum with multicore and manycore architectures, and with the expected arrival of exascale computing. As a result, the space of potential ideas and solutions is still far from being widely explored.
Lecture Notes in Computer Science, 2006
Welcome to the proceedings of the 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA 2006), which was held in Sorrento, Italy, December, 4-6 2006. Parallel computing has become a mainstream research area in computer science and the ISPA conference has become one of the premier forums for the presentation of new and exciting research on all aspects of parallel and distributed computing. We are pleased to present the proceedings for ISPA 2006, which comprises a collection of excellent technical papers and keynote speeches. The accepted papers cover a wide range of exciting topics including architectures, languages, algorithms, software, networking and applications. The conference continues to grow and this year a record total of 277 manuscripts were submitted for consideration by the Program Committee. From these submissions the Program Committee selected only 79 regular papers in the program, which reflects the acceptance rate as 28%. An additional 10 workshops complemented the outstanding paper sessions. The submission and review process worked as follows. Each submission was assigned to at least three Program Committee members for review. Each Program Committee member prepared a single review for each assigned paper or assigned a paper to an outside reviewer for review. In addition, the Program Chairs and Program Vice-Chairs read the papers when a conflicting review result occurred. Finally, after much discussion among the Program Chairs and Program Vice-Chairs, based on the review scores, the Program Chairs made the final decision. Given the large number of submissions, each Program Committee member was assigned roughly 7-12 papers. The excellent program required a lot of effort from many people. First, we would like to thank all the authors for their hard work in preparing submissions to the conference. We deeply appreciate the effort and contributions of the Program Committee members who worked very hard to select the very best submissions and to put together an exciting program. We are also very grateful to the keynote speakers for accepting our invitation to present keynote talks. Thanks go to the Workshop Chairs for organizing ten excellent workshops on several important topics related to parallel and distributed computing and applications.
Lecture Notes in Computer Science, 2012
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
2006
We present a new parallel computation model called the Parallel Resource-Optimal computation model. PRO is a framework being proposed to enable the design of efficient and scalable parallel algorithms in an architecture-independent manner, and to simplify the analysis of such algorithms. A focus on three key features distinguishes PRO from existing parallel computation models. First, the design and analysis of a parallel algorithm in the PRO model is performed relative to the time and space complexity of a sequential reference algorithm. Second, a PRO algorithm is required to be both time-and space-optimal relative to the reference sequential algorithm. Third, the quality of a PRO algorithm is measured by the maximum number of processors that can be employed while optimality is maintained. Inspired by the Bulk Synchronous Parallel model, an algorithm in the PRO model is organized as a sequence of supersteps. Each superstep consists of distinct computation and communication phases, but the supersteps are not required to be separated by synchronization barriers. Both computation and communication costs are accounted for in the runtime analysis of a PRO algorithm. Experimental results on parallel algorithms designed using the PRO model-and implemented using its accompanying programming environment SSCRAP-demonstrate that the model indeed delivers efficient and scalable implementations on a wide range of platforms.
In this paper a survey on current trends in parallel computing has been studied that depicts all the aspects of parallel computing system. A large computational problem that can not be solved by a single CPU can be divided into a chunk of small enough subtasks which are processed simultaneously by a parallel computer. The parallel computer consists of parallel computing hardware, parallel computing model, software support for parallel programming. Parallel performance measurement parameters and parallel benchmarks are used to measure the performance of a parallel computing system. The hardware and the software are specially designed for parallel algorithm and programming. This paper explores all the aspects of parallel computing and its usefulness.
This book grew out of lecture notes for a course on parallel algorithms that I gave at Drexel University over a period of several years. I was frustrated by the lack of texts that had the focus that I wanted. Although the book also addresses some architectural issues, the main focus is on the development of parallel algorithms on "massively parallel" computers. This book could be used in several versions of a course on Parallel Algorithms. We tend to focus on SIMD parallel algorithms in several general areas of application:
Parallel processing offers enhanced speed of execution to the user and facilitated by different approaches like data parallelism and control parallelism. Graphic Processing Units provide faster execution due to dedicated hardware and tools. This paper presents two popular approaches and techniques for distributed computing and GPU computing, to assist a novice in parallel computing technique. The paper discusses environment needs to be setup for both the above approaches and as a case study demonstrate matrix multiplication algorithm using SIMD architecture.
Lecture Notes in Computer Science, 2002
The emerging discipline of algorithm engineering has primarily focused on transforming pencil-and-paper sequential algorithms into robust, efficient, well tested, and easily used implementations. As parallel computing becomes ubiquitous, we need to extend algorithm engineering techniques to parallel computation. Such an extension adds significant complications. After a short review of algorithm engineering achievements for sequential computing, we review the various complications caused by parallel computing, present some examples of successful efforts, and give a personal view of possible future research.
Journal of Parallel and Distributed Computing, 1993
There are several metrics that characterize the performance of a parallel system, such as, parallel execution time, speedup and e ciency. A number of properties of these metrics have been studied. For example, it is a well known fact that given a parallel architecture and a problem of a xed size, the speedup of a parallel algorithm does not continue to increase with increasing number of processors. It usually tends to saturate or peak at a certain limit. Thus it may not be useful to employ more than an optimal number of processors for solving a problem on a parallel computer. This optimal number of processors depends on the problem size, the parallel algorithm and the parallel architecture. In this paper we study the impact of parallel processing overheads and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimizing the parallel execution time. We then study a more general criterion of optimality and show how operating at the optimal point is equivalent to operating at a unique value of e ciency which is characteristic of the criterion of optimality and the properties of the parallel system under study. We put the technical results derived in this paper in perspective with similar results that have appeared in the literature before and show how this paper generalizes and/or extends these earlier results.
1995
Description/Abstract The best enterprises have both a compelling need pulling them forward and an innovative technological solution pushing them on. In high-performance computing, we have the need for increased computational power in many applications and the inevitable long-term solution is massive parallelism. In the short term, the relation between pull and push may seem unclear as novel algorithms and software are needed to support parallel computing.
2000
Large-scale parallel computations are more common than ever, due to the increasing availability of multi-processor systems. However, writing parallel software is often a complicated and error-prone task. To relieve Diffpack users of the tedious and low-level technical details of parallel programming, we have designed a set of new software modules, tools, and programming rules, which will be the topic of
2012
Modern computer systems are becoming increasingly heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e. g., MPI with OpenCL or CUDA) in order to exploit the full compute capability of a system. In this paper, we present dOpenCL (Distributed OpenCL)a uniform approach to programming distributed heterogeneous systems with accelerators. dOpenCL extends the OpenCL standard, such that arbitrary computing devices installed on any node of a distributed system can be used together within a single application. dOpenCL allows moving data and program code to these devices in a transparent, portable manner. Since dOpenCL is designed as a fully-fledged implementation of the OpenCL API, it allows running existing OpenCL applications in a heterogeneous distributed environment without any modifications. We describe in detail the mechanisms that are required to implement OpenCL for distributed systems, including a device management mechanism for running multiple applications concurrently. Using three application studies, we compare the performance of dOpenCL with MPI+OpenCL and a standard OpenCL implementation.
Lecture Notes in Computer Science, 1995
This article focuses on principles for the design of e cient parallel algorithms for distributed memory computing systems. We describe the general trend in the development of architectural properties and evaluate the state-of-the-art in a number of basic primitives like graph embedding, partitioning, dynamic load distribution, and communication which are used, to some extent, within all parallel applications. We discuss possible directions for future work on the design of universal basic primitives, able to perform e ciently on a broad range of parallel systems and applications, and we also give certain examples of speci c applications which demand specialized basic primitives in order to obtain e cient parallel implementations. Finally, w e show that programming frames can o er a convenient w a y to encapsulate algorithmic know-how on applications and basic primitives and to o er this knowledge to nonspecialist users in a very e ective w a y .
1997
This book is a must for anyone i n terested in entering t he fascinating n ew world of parallel optimization using parallel processors | computers capable of doing an enormous number of complex operations in a nanosecond. The a uthors are among t he pioneers of this fascinating n ew world and they tell us what n ew applications they explored, what algorithms appear to w ork best, how parallel processors di er in their design, and w h at t he comparative r e s u l ts w ere using di erent t ypes of algorithms on di erent types of parallel processors to solve t hem. According t o an old adage, the w h ole can sometimes be much more than the s u m o f i t s parts. I am thoroughly in agreement with t he a uthors' belief in the a d d ed value of bringing t ogether Applications, Mathematical Algorithms and Parallel Computing t echniques. This is exactly what t hey found true in their own research a n d report on in the book. Many y ears ago, I, too, experienced the t hrill of combining t hree diverse disciplines: the Application (in my case Linear Programs), the Solution Algorithm (the Simplex Method), and t he t hen New Tool (the Serial Computer). The u nion of the t hree made possible the o ptimization of many real-world problems. Parallel processors are the n ew generation and t hey have t he p o wer to t ackle applications which require solution in real time, or have m o d el parameters which are not known with certainty, o r h ave a v ast numb e r o f v ariables and constraints. Image restoration tomography, radiation therapy, n ance, industrial planning, transportation and economics are the sources for many o f t he interesting practical problems used by t he a uthors to t est the m ethodology.
The next generation computing systems are focusing on parallel computing to solve the problem in fast and high speed using parallel programming concepts. The running time of any complex algorithm or applications can be improved by running more than one task at the same time with multiple processors concurrently. Here the performance improvement is measured in terms of increase in the number of cor es per machine and is analyzed for better Energy Efficient optimal work load balance. In this paper we investigate the review and literature survey of parallel computers for future improvements. We aim to present the theoretical and technical terminologies for parallel computing. The focus here is the process of comparative analysis of single core and multicore systems to run an application program for faster execution time and optimize the scheduler for better performance.
Lecture Notes in Computer Science, 2002
Choice Reviews Online, 2004
The major parallel programming models for scalable parallel architectures are the message passing model and the shared memory model. This article outlines the main concepts of these models as well as the industry standard programming interfaces MPI and OpenMP. To exploit the potential performance of parallel computers, programs need to be carefully designed and tuned. We will discuss design decisions for good performance as well as programming tools that help the programmer in program tuning.
Journal of Communications and Networks, 2014
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributedmemory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.