Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1993
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.
1995
We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the communication of fixed-sized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P ). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM-5) [CKP + 93, CDMS94]. However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP-2, Paragon, Meiko CS-2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has o...
2006
We present a new parallel computation model called the Parallel Resource-Optimal computation model. PRO is a framework being proposed to enable the design of efficient and scalable parallel algorithms in an architecture-independent manner, and to simplify the analysis of such algorithms. A focus on three key features distinguishes PRO from existing parallel computation models. First, the design and analysis of a parallel algorithm in the PRO model is performed relative to the time and space complexity of a sequential reference algorithm. Second, a PRO algorithm is required to be both time-and space-optimal relative to the reference sequential algorithm. Third, the quality of a PRO algorithm is measured by the maximum number of processors that can be employed while optimality is maintained. Inspired by the Bulk Synchronous Parallel model, an algorithm in the PRO model is organized as a sequence of supersteps. Each superstep consists of distinct computation and communication phases, but the supersteps are not required to be separated by synchronization barriers. Both computation and communication costs are accounted for in the runtime analysis of a PRO algorithm. Experimental results on parallel algorithms designed using the PRO model-and implemented using its accompanying programming environment SSCRAP-demonstrate that the model indeed delivers efficient and scalable implementations on a wide range of platforms.
Choice Reviews Online, 2004
The major parallel programming models for scalable parallel architectures are the message passing model and the shared memory model. This article outlines the main concepts of these models as well as the industry standard programming interfaces MPI and OpenMP. To exploit the potential performance of parallel computers, programs need to be carefully designed and tuned. We will discuss design decisions for good performance as well as programming tools that help the programmer in program tuning.
2000
Large-scale parallel computations are more common than ever, due to the increasing availability of multi-processor systems. However, writing parallel software is often a complicated and error-prone task. To relieve Diffpack users of the tedious and low-level technical details of parallel programming, we have designed a set of new software modules, tools, and programming rules, which will be the topic of
… , University of California at Berkeley, Technical Report No. …
The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation.
Texts in Computational Science and Engineering, 2010
2008 DoD HPCMP Users Group Conference, 2008
Message passing, as implemented in message passing interface (MPI), has become the industry standard for programming on distributed memory parallel architectures, while the threading on shared memory machines is typically implemented in OpenMP. Outstanding performance has been achieved with these methods, but only on a relatively small number of codes, requiring expert tuning, particularly as system size increases. With the advent of multicore/manycore microprocessors, and continuing scaling in the size of systems, an inflection point may be nearing that will require a substantial shift in the way large scale systems are programmed to maintain productivity. The parallel paradigms project was undertaken to evaluate the nearterm readiness of a number of emerging ideas in parallel programming, with specific emphasis on their applicability to applications in the User Productivity Enhancement and Technology Transfer (PET) Electronics, Networking, and Systems/C4I (ENS) focus area. The evaluation included examinations of usability, performance, scalability, and support for fault tolerance. Applications representative of ENS problems were ported to each of the evaluated languages, along with a set of "dwarf" codes representing a broader workload. In addition, a user study was undertaken where teaching modules were developed for each paradigm, and delivered to groups of both novice and expert programmers to measure productivity. Results will be presented from six paradigms currently undergoing ongoing evaluation. Experiences with each of these models will be presented, including performance of applications re-coded across these models and feedback from users.
Lecture Notes in Computer Science, 2001
This paper surveys and places into perspective a number of results concerning the D-BSP (Decomposable Bulk Synchronous Parallel) model of computation, a variant of the popular BSP model proposed by Valiant in the early nineties. D-BSP captures part of the proximity structure of the computing platform, modeling it by suitable decompositions into clusters, each characterized by its own bandwidth and latency parameters. Quantitative evidence is provided that, when modeling realistic parallel architectures, D-BSP achieves higher effectiveness and portability than BSP, without significantly affecting the ease of use. It is also shown that D-BSP avoids some of the shortcomings of BSP which motivated the definition of other variants of the model. Finally, the paper discusses how the aspects of network proximity incorporated in the model allow for a better management of network congestion and bank contention, when supporting a shared-memory abstraction in a distributed-memory environment.
… , University of California …, 2008
In December 2006, we published a broad survey of the issues for the whole field concerning the multicore/manycore sea change (see view.eecs.berkeley.edu). (Asanovic, Bodik et al. 2006) We view the ultimate goal as the ability to create efficient and correct software productively that scales smoothly as the number of cores per chip doubles biennially. This much shorter report covers the specific research agenda that a large group of us at Berkeley is going to follow. This report is based on a proposal for creating a Universal Parallel Computing Research Center (UPCRC) that a technical committee from Intel and Microsoft unanimously selected as the top proposal in a competition with the top 25 computer science departments. The five-year, $10M, UPCRC forms the foundation for the U.C. Berkeley Parallel Computing Laboratory, or Par Lab, a multidisciplinary research project exploring the future of parallel processing (see parlab.eecs.berkeley.edu) To take a fresh approach to the longstanding parallel computing problem, our research agenda will be driven by compelling applications developed by domain experts. Historically, past efforts to resolve these challenges have often been driven "bottom-up" from the hardware, with applications an afterthought. We will focus on exciting new applications that need much more computing horsepower to run well, rather than on legacy programs that already run well on today's computers. Our applications are in the areas of personal health, image retrieval, music, speech understanding, and web browsers. The development of parallel software is the heart of our research agenda. The task will be divided into two layers: an efficiency layer that aims at low overhead for 10 percent of the best programmers, and a productivity layer for the rest of the programming community-including domain experts-that reuses the parallel software developed at the efficiency layer. Key to this approach is a layer of libraries and programming frameworks centered on the 13 computational bottlenecks ("motifs") that we identified in the original Berkeley View report. (Asanovic, Bodik et al. 2006) We will also create a Composition and Coordination Language to make it easier to compose these components. Finally, we will rely on autotuning to map the software efficiently to a particular parallel computer. Past attempts have often relied on a single programming abstraction and language for all programmers and on automatically parallelizing compilers. The role of the operating system and the architecture in this project is to support software and applications in achieving the ultimate goal, rather than the conventional approach of fixing the environment in which parallel software must survive. Example innovations include very thin hypervisors, which allow user-level control of processor scheduling, and hardware support for partitioning and fast barrier synchronization. We will prototype the hardware of the future using field-programmable gate arrays (FPGAs), which we believe are fast enough to be interesting to parallel software researchers, yet flexible enough to "tape out" new designs every day, while being cheap enough that university researchers can afford to construct systems containing hundreds of processors. This prototyping infrastructure is called RAMP (Research Accelerator for Multiple Processors), and is being developed by a consortium of universities and companies (see ramp.eecs.berkeley.edu).
In this paper a survey on current trends in parallel computing has been studied that depicts all the aspects of parallel computing system. A large computational problem that can not be solved by a single CPU can be divided into a chunk of small enough subtasks which are processed simultaneously by a parallel computer. The parallel computer consists of parallel computing hardware, parallel computing model, software support for parallel programming. Parallel performance measurement parameters and parallel benchmarks are used to measure the performance of a parallel computing system. The hardware and the software are specially designed for parallel algorithm and programming. This paper explores all the aspects of parallel computing and its usefulness.
2014
In the history of computational world, sequential uni-processor computers have been exploited for years to solve scientific and business problems. To satisfy the demand of compute & data hungry applications, it was observed that better response time can be achieved only through parallelism. Large computational problems were partitioned and solved by using multiple CPUs in parallel. Computing performance was further improved by adopting multi-core architecture which provides hardware parallelism through use of multiple cores. Efficient resource utilization of a parallel computing environment by using software and hardware parallelism is a major research challenge. The present hardware technologies provide freedom to algorithm developers for control & management of resources through software codes, such as threads-to-cores mapping in recent multi-core processors. In this paper, a survey is presented since beginning of parallel computing up to the use of present state-of-art multi-core...
Journal of Parallel and Distributed Computing, 1991
The many revolutionary changes brought about by the integrated chip-in the form of significant improvements in processing, storage, and communications-have also brought about a host of related problems for designers and users of parallel and distributed systems. These systems develop and proliferate at an amazing momentum, motivating research in the understanding and testing of complex distributed systems. Unfortunately, these relatively expensive systems are being designed, built, used, refined, and rebuilt (at perhaps an avoidable expense) even before we have developed methodology for understanding the underlying principles of their behavior. Though it is not realistic to expect that the current rate of manufacturing can be slowed down to accommodate research in design principles, it behooves us to bring attention to the importance of design methodology and performance understanding of such systems and, in this way, to attempt to influence parallel system design in a positive manner. At the present time, there is considerable debate among various schools of thought on parallel machine architectures, with different schools proposing different architectures and design philosophies. Consider, for example, one such debate involving tightly coupled systems. Early on, Minsky [l] conjectured a somewhat pessimistic bound of log y1 for typical speedup on n processors. Since then, researchers [2 ] have shown that certain characteristics of programs, such as the DO loops in Fortran, can often be exploited to yield more optimistic levels of speedup. Other researchers [ 31 counter this kind of optimism by pointing out that parallel and vector processing has limitations in potential speedup (i.e., Amdahl's law) to the extent that speedup is bounded from above by n/ (s. n + 1-s), where s is the fraction of a computation that must be done serially. This suggests that it makes more sense to first concentrate on achieving the maximum speedup possible with a single powerful processor. In this view, the distributed approach is not as attractive an option. More recently, work on hypercubes [4] appears to indicate that
Lcpc, 1995
It is our pleasure to present the papers from the 20th International Workshop on Languages and Compilers for Parallel Computing! For the past 19 years, this workshop has been one of the primary venues for presenting and learning about a wide range of current research in parallel computing. We believe that tradition has continued in this, the 20th year of the workshop.
1998
Abstract We survey parallel programming models and languages using six criteria to assess their suitability for realistic portable parallel programming. We argue that an ideal model should by easy to program, should have a software development methodology, should be architecture-independent, should be easy to understand, should guarantee performance, and should provide accurate information about the cost of programs.
2007 IEEE International …, 2007
This paper proposes the study of a new computation model that attempts to address the underlying sources of performance degradation (e.g. latency, overhead, and starvation) and the difficulties of programmer productivity (e.g. explicit locality management and scheduling, performance tuning, fragmented memory, and synchronous global barriers) to dramatically enhance the broad effectiveness of parallel processing for high end computing. In this paper, we present the progress of our research on a parallel programming and execution model -mainly, ParalleX. We describe the functional elements of ParalleX, one such model being explored as part of this project. We also report our progress on the development and study of a subset of ParalleX -the LITL-X at University of Delaware. We then present a novel architecture model -Gilgamesh II -as a ParalleX processing architecture. A design point study of Gilgamesh II and the architecture concept strategy are presented.
Theory of Computing Systems / Mathematical Systems Theory, 1999
There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the shared-memory abstraction as an easyto-use platform, the bandwidth limitations of current machines have diverted much attention to message-passing and distributed-memory models (such as the BSP and LogP) that account more properly for these limitations.
Communications of the ACM, 2009
1995
Description/Abstract The best enterprises have both a compelling need pulling them forward and an innovative technological solution pushing them on. In high-performance computing, we have the need for increased computational power in many applications and the inevitable long-term solution is massive parallelism. In the short term, the relation between pull and push may seem unclear as novel algorithms and software are needed to support parallel computing.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.