Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Advances in Science, Technology and Engineering Systems Journal
As all other laws of the growth in computing, the growth of computing performance also shows a "logistic curve"-like behavior, rather than an unlimited exponential growth. The stalling of the single-processor performance experienced nearly two decades ago forced computer experts to look for alternative methods, mainly for some kind of parallelization. Solving the task needs different parallelization methods, and the wide range of those distributed systems limits the computing performance in very different ways. Some general limitations are shortly discussed, and a (by intention strongly simplified) general model of performance of parallelized systems is introduced. The model enables to highlight bottlenecks of parallelized systems of different kind and with the published performance data enables to predict performance limits of strongly parallelized systems like large scale supercomputers and neural networks. Some alternative solution possibilities of increasing computing performance are also discussed.
Journal of Parallel and Distributed Computing, 1993
There are several metrics that characterize the performance of a parallel system, such as, parallel execution time, speedup and e ciency. A number of properties of these metrics have been studied. For example, it is a well known fact that given a parallel architecture and a problem of a xed size, the speedup of a parallel algorithm does not continue to increase with increasing number of processors. It usually tends to saturate or peak at a certain limit. Thus it may not be useful to employ more than an optimal number of processors for solving a problem on a parallel computer. This optimal number of processors depends on the problem size, the parallel algorithm and the parallel architecture. In this paper we study the impact of parallel processing overheads and the degree of concurrency of a parallel algorithm on the optimal number of processors to be used when the criterion for optimality is minimizing the parallel execution time. We then study a more general criterion of optimality and show how operating at the optimal point is equivalent to operating at a unique value of e ciency which is characteristic of the criterion of optimality and the properties of the parallel system under study. We put the technical results derived in this paper in perspective with similar results that have appeared in the literature before and show how this paper generalizes and/or extends these earlier results.
Texts in Computational Science and Engineering, 2010
Journal of Parallel and Distributed Computing, 1991
The many revolutionary changes brought about by the integrated chip-in the form of significant improvements in processing, storage, and communications-have also brought about a host of related problems for designers and users of parallel and distributed systems. These systems develop and proliferate at an amazing momentum, motivating research in the understanding and testing of complex distributed systems. Unfortunately, these relatively expensive systems are being designed, built, used, refined, and rebuilt (at perhaps an avoidable expense) even before we have developed methodology for understanding the underlying principles of their behavior. Though it is not realistic to expect that the current rate of manufacturing can be slowed down to accommodate research in design principles, it behooves us to bring attention to the importance of design methodology and performance understanding of such systems and, in this way, to attempt to influence parallel system design in a positive manner. At the present time, there is considerable debate among various schools of thought on parallel machine architectures, with different schools proposing different architectures and design philosophies. Consider, for example, one such debate involving tightly coupled systems. Early on, Minsky [l] conjectured a somewhat pessimistic bound of log y1 for typical speedup on n processors. Since then, researchers [2 ] have shown that certain characteristics of programs, such as the DO loops in Fortran, can often be exploited to yield more optimistic levels of speedup. Other researchers [ 31 counter this kind of optimism by pointing out that parallel and vector processing has limitations in potential speedup (i.e., Amdahl's law) to the extent that speedup is bounded from above by n/ (s. n + 1-s), where s is the fraction of a computation that must be done serially. This suggests that it makes more sense to first concentrate on achieving the maximum speedup possible with a single powerful processor. In this view, the distributed approach is not as attractive an option. More recently, work on hypercubes [4] appears to indicate that
1995
Description/Abstract The best enterprises have both a compelling need pulling them forward and an innovative technological solution pushing them on. In high-performance computing, we have the need for increased computational power in many applications and the inevitable long-term solution is massive parallelism. In the short term, the relation between pull and push may seem unclear as novel algorithms and software are needed to support parallel computing.
2010
This thesis reviews selected topics from the theory of parallel computation. The research begins with a survey of the proposed models of parallel computation. It examines the characteristics of each model and it discusses its use either for theoretical studies, or for practical applications. Subsequently, it employs common simulation techniques to evaluate the computational power of these models. The simulations establish certain model relations before advancing to a detailed study of the parallel complexity theory, which is the subject of the second part of this thesis. The second part examines classes of feasible highly parallel problems and it investigates the limits of parallelization. It is concerned with the benefits of the parallel solutions and the extent to which they can be applied to all problems. It analyzes the parallel complexity of various well-known tractable problems and it discusses the automatic parallelization of the efficient sequential algorithms. Moreover, it ...
ratio rather than on the high performance as in scientific applications.
Parallel computing has become an essential subject in the field of computer science and also it is shown to be critical when researching in high end solutions. The evolution of computer architectures (multicore and manycore) towards an increased quantity of cores, where parallelism could be the approach to option for speeding up an algorithm within the last few decades, the graphics processing unit, GPU and CPU, has gained an essential place in the area of high end computing (HPC) due to its low priced and massive processing power that is parallel. In this paper, we survey the idea of parallel computing, especially CPU computing and its programming models and also gives a couple of theoretical and technical concepts which can be often needed to understand the CPU and GPU as well as its parallelism in massive model. In particular, we show how this technology is new in assisting the field of computational physics, especially when the issue is data parallel.
Ibm Systems Journal, 1995
were rapidly developed and moved to market, and illustrate how some customers are using them in their businesses.
In this paper a survey on current trends in parallel computing has been studied that depicts all the aspects of parallel computing system. A large computational problem that can not be solved by a single CPU can be divided into a chunk of small enough subtasks which are processed simultaneously by a parallel computer. The parallel computer consists of parallel computing hardware, parallel computing model, software support for parallel programming. Parallel performance measurement parameters and parallel benchmarks are used to measure the performance of a parallel computing system. The hardware and the software are specially designed for parallel algorithm and programming. This paper explores all the aspects of parallel computing and its usefulness.
2000
Large-scale parallel computations are more common than ever, due to the increasing availability of multi-processor systems. However, writing parallel software is often a complicated and error-prone task. To relieve Diffpack users of the tedious and low-level technical details of parallel programming, we have designed a set of new software modules, tools, and programming rules, which will be the topic of
arXiv (Cornell University), 2016
With the spread of multi-and many-core processors more and more typical task is to re-implement some source code written originally for a single processor to run on more than one cores. Since it is a serious investment, it is important to decide how much efforts pays off, and whether the resulting implementation has as good performability as it could be. The Amdahl's law provides some theoretical upper limits for the performance gain reachable through parallelizing the code, but it needs the detailed architectural knowledge of the program code, does not consider the housekeeping activity needed for parallelization and cannot tell how the actual stage of parallelization implementation performs. The present paper suggests a quantitative measure for that goal. This figure of merit is derived experimentally, from measured running times, and number of threads/cores. It can be used to quantify the used parallelization technology, the connection between the computing units, the acceleration technology under the given conditions, or the performance of the software team/compiler.
2014
In the history of computational world, sequential uni-processor computers have been exploited for years to solve scientific and business problems. To satisfy the demand of compute & data hungry applications, it was observed that better response time can be achieved only through parallelism. Large computational problems were partitioned and solved by using multiple CPUs in parallel. Computing performance was further improved by adopting multi-core architecture which provides hardware parallelism through use of multiple cores. Efficient resource utilization of a parallel computing environment by using software and hardware parallelism is a major research challenge. The present hardware technologies provide freedom to algorithm developers for control & management of resources through software codes, such as threads-to-cores mapping in recent multi-core processors. In this paper, a survey is presented since beginning of parallel computing up to the use of present state-of-art multi-core...
2000
In December 2006, we published a broad survey of the issues for the whole field concerning the multicore/manycore sea change (see view.eecs.berkeley.edu). We view the ultimate goal as the ability to create efficient and correct software productively that scales smoothly as the number of cores per chip doubles biennially. This much shorter report covers the specific research agenda that a large group of us at Berkeley is going to follow. This report is based on a proposal for creating a Universal Parallel Computing Research Center (UPCRC) that a technical committee from Intel and Microsoft unanimously selected as the top proposal in a competition with the top 25 computer science departments. The five-year, $10M, UPCRC forms the foundation for the U.C. Berkeley Parallel Computing Laboratory, or Par Lab, a multidisciplinary research project exploring the future of parallel processing (see parlab.eecs.berkeley.edu) To take a fresh approach to the longstanding parallel computing problem, our research agenda will be driven by compelling applications developed by domain experts. Historically, past efforts to resolve these challenges have often been driven "bottom-up" from the hardware, with applications an afterthought. We will focus on exciting new applications that need much more computing horsepower to run well, rather than on legacy programs that already run well on today's computers. Our applications are in the areas of personal health, image retrieval, music, speech understanding, and web browsers. The development of parallel software is the heart of our research agenda. The task will be divided into two layers: an efficiency layer that aims at low overhead for 10 percent of the best programmers, and a productivity layer for the rest of the programming community--including domain experts--that reuses the parallel software developed at the efficiency layer. Key to this approach is a layer of libraries and programming frameworks centered on the 13 computational bottlenecks ("motifs") that we identified in the original Berkeley View report. We will also create a Composition and Coordination Language to make it easier to compose these components. Finally, we will rely on autotuning to map the software efficiently to a particular parallel computer. Past attempts have often relied on a single programming abstraction and language for all programmers and on automatically parallelizing compilers. The role of the operating system and the architecture in this project is to support software and applications in achieving the ultimate goal, rather than the conventional approach of fixing the environment in which parallel software must survive. Example innovations include very thin hypervisors, which allow user-level control of processor scheduling, and hardware support for partitioning and fast barrier synchronization. We will prototype the hardware of the future using field-programmable gate arrays (FPGAs), which we believe are fast enough to be interesting to parallel software researchers, yet flexible enough to "tape out" new designs every day, while being cheap enough that university researchers can afford to construct systems containing hundreds of processors. This prototyping infrastructure is called RAMP (Research Accelerator for Multiple Processors), and is being developed by a consortium of universities and companies (see ramp.eecs.berkeley.edu).
… , University of California at Berkeley, Technical Report No. …
The recent switch to parallel microprocessors is a milestone in the history of computing. Industry has laid out a roadmap for multicore designs that preserves the programming paradigm of the past via binary compatibility and cache coherence. Conventional wisdom is now to double the number of cores on a chip with each silicon generation.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.