Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005, Statistics: A Series of Textbooks and Monographs
AI
This paper discusses the evolution and significance of parallel computation in contemporary high-performance computing environments. It highlights the transition from traditional supercomputers to massively parallel computers using commodity CPU chips, addressing their architectures, efficiency, and practical applications in data-intensive fields such as data mining and scientific computation. The document also explores specific types of parallel computing devices, including pipeline processors and their operational efficiencies.
Journal of Parallel and Distributed Computing, 1991
The many revolutionary changes brought about by the integrated chip-in the form of significant improvements in processing, storage, and communications-have also brought about a host of related problems for designers and users of parallel and distributed systems. These systems develop and proliferate at an amazing momentum, motivating research in the understanding and testing of complex distributed systems. Unfortunately, these relatively expensive systems are being designed, built, used, refined, and rebuilt (at perhaps an avoidable expense) even before we have developed methodology for understanding the underlying principles of their behavior. Though it is not realistic to expect that the current rate of manufacturing can be slowed down to accommodate research in design principles, it behooves us to bring attention to the importance of design methodology and performance understanding of such systems and, in this way, to attempt to influence parallel system design in a positive manner. At the present time, there is considerable debate among various schools of thought on parallel machine architectures, with different schools proposing different architectures and design philosophies. Consider, for example, one such debate involving tightly coupled systems. Early on, Minsky [l] conjectured a somewhat pessimistic bound of log y1 for typical speedup on n processors. Since then, researchers [2 ] have shown that certain characteristics of programs, such as the DO loops in Fortran, can often be exploited to yield more optimistic levels of speedup. Other researchers [ 31 counter this kind of optimism by pointing out that parallel and vector processing has limitations in potential speedup (i.e., Amdahl's law) to the extent that speedup is bounded from above by n/ (s. n + 1-s), where s is the fraction of a computation that must be done serially. This suggests that it makes more sense to first concentrate on achieving the maximum speedup possible with a single powerful processor. In this view, the distributed approach is not as attractive an option. More recently, work on hypercubes [4] appears to indicate that
Texts in Computational Science and Engineering, 2010
Journal of Parallel and Distributed Computing, 1994
We discuss issues pertinent to performance analysis of massively parallel systems. We first argue that single parameter characterization of parallel software or of parallel hardware rarely provides insight into the complex interactions among the soft,ware and the hard ware components of a parallel system. In particular, bounds for the speed up based upon simple models of parallelism are violated when a model ignores the effects of communication.
ratio rather than on the high performance as in scientific applications.
1990
In this paper we analyze a model of a parallel processing syslem. In our model there is a single queue which is scrvcd by K 1 1 identical proccs-SOTS. Jobs arc assumed LO consist of a scqucnce of barrier synchronizations where, at each step, the number of tasks that must be synchronixcd is random with a known distribution. An exact analysis of the model is dcrivcd. The model lcads to a rich set of rcsul(x characterizing the performance of parallel processing syslcms. WC show Ihal, the number of jobs concurrenlly in execution, as well as I.hc number 0C synchronixation varia.bles, grows linearly wi1.h Ihe load or the system and strongly dcpcntls on the avcragc number of parallel lasks Found in the workload. I'ropcrtics of expected rcsponsc I,imc or such syslcms arc exbcnsively analyzed and, in parliciilar, wc report on some non-obvious response time behavior that arises as a function of l;hc variance of parallelism round in the work1oa.d. Rascd on exact response lime analysis, we propose a simple calculalion lhat can be used as a rule of t.humh 1.0 predict speedups. This can be viewed a.s a gcncra.lizalion of Amdahl's law t.hat includes qucucirrg cC fccts. This gcncralizat.ion. is rcformulalcd when prccisc workloads cannot be characterizccl, but raohcr when only Ihc fraclion or scquerrtial work and the average number or parallcl tasks arc assumcd to bc known.
Proceedings of the June 7-10, 1982, national computer conference on - AFIPS '82, 1982
We discuss a parallel-processing experiment that uses a particle-in-cell (PIC) c(,[le to study the feasibility of doing large-scale scientific calculations on mult:ple-processor architectures. A multithread version of Lhis Los A]amos PIC code was successfully implemented and timed on a UNIVAC. System 1100/80 computer, Use of a single copy of the Instruction stream, and common memory to hold data, rlimin.ll("d dalii transmission between processors. "I"hr mul[ iplv-processing algorithm exploit% the' PIC~*ode's high dcgrrc of Iarg(', independent tasks, ;Js w(!11 IS Lhl' configuration of Lhe UNIVAC Syst.crn l'10()/lJO. Timing results lor Ihc multithrcad vu]sion [)! the PIC C(MI(' using onr, t,W(], thrwr. and four idrnt iral procr~:; ors arc given and arc shown to h.~vc promising sprvdIIp t imes whrn rompilrcd 10 the ovrra] 1 run t imrs mr;lsurcd for a singir-thread version of thr lJIc coctr.
arXiv (Cornell University), 2016
With the spread of multi-and many-core processors more and more typical task is to re-implement some source code written originally for a single processor to run on more than one cores. Since it is a serious investment, it is important to decide how much efforts pays off, and whether the resulting implementation has as good performability as it could be. The Amdahl's law provides some theoretical upper limits for the performance gain reachable through parallelizing the code, but it needs the detailed architectural knowledge of the program code, does not consider the housekeeping activity needed for parallelization and cannot tell how the actual stage of parallelization implementation performs. The present paper suggests a quantitative measure for that goal. This figure of merit is derived experimentally, from measured running times, and number of threads/cores. It can be used to quantify the used parallelization technology, the connection between the computing units, the acceleration technology under the given conditions, or the performance of the software team/compiler.
Proceedings of the IEEE, 1991
A great many scientific models possess a high degree of inherent parallelism. For simulation purposes this may often be exploited by employing a massively parallel SIMD computer. We describe one such computer, the Distributed Array Processor (DAP), and discuss the optimal mapping of a typical problem onto the computer architecture to best exploit the model parallelism. By focussing on specific models currently under study, we exemplify the types of problem which benefit most from a parallel implementation. The extent of this benefit is considered relative to implementation on a machine of conventional architecture. * Current Address: Edinburgh Parallel Computing Centre, University of Edinburgh.
2013 IEEE 16th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), 2013
DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User Agreement:
Lecture Notes in Computer Science, 2012
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
IEEE Transactions on Computers, 1988
Abstruct-A software tool for measuring parallelism in large scientific/engineering applications is described in this paper. The proposed tool measures the total parallelism present in programs, filtering out the effects of communication/synchronization delays, finite storage, limited number of processors, the policies for management of processors and storage, etc. Although an ideal machine which can exploit the total parallelism is not realizable, such measures would aid the calibration and design of various architectures/compilers. The proposed software tool accepts ordinary Fortran programs as its input. Therefore, parallelism can be measured easily on many fairly big programs. Some measurements for parallelism obtained with the help of this tool are also reported. It is observed that the average parallelism in the chosen programs is in the range of 500-3500 Fortran statements executing concurrently in each clock cycle in an idealized environment.
Computer, 2000
1996
These lecture notes under development and constant revision, like the eld itself have been used at MIT in a graduate course rst o ered by Alan Edelman and Shang-Hua Teng during the spring of 1994 MIT 18.337, Parallel Scienti c Computing. This rst class had about forty students from a variety of disciplines which include Applied Mathematics, Computer Science, Mechanical Engineering, Chemical Engineering, Aeronautics and Aerospace, and Applied Physics. Because of the diverse backgrounds of the students, the course, by necessity, w as designed to be of interest to engineers, computer scientists, and applied mathematicians. Our course covers a mixture of material that we feel students should be exposed to. Our primary focus is on modern numerical algorithms for scienti c computing, and also on the historical trends in architectures. At the same time, we h a ve always felt that students and the professors must su er through hands-on experience with modern parallel machines. Some students enjoy ghting new machines; others scream and complain. This is the reality of the subject. In 1995, the course was taught again by Alan Edelman with an additional emphasis on the use of portable parallel software tools. The sad truth was that there were not yet enough fully developed tools to be used. The situation is currently improving. During 1994 and 1995 our students programmed the 128 node Connection Machine CM5. This machine was the 35th most powerful computer in the world in 1994, then the very same machine was the 74th most powerful machine in the spring of 1995. At the time of writing, December 1995, this machine has sunk to position 136. The fastest machine in the world is currently in Japan. In the 1996 course we used the IBM SP-2 and Boston University's SGI machines. In addition to coauthors Shang-Hua Teng and Robert Schreiber, I would like to thank our numerous students who have written and commented on these notes and have also prepared many o f the diagrams. We also thank the students from Minnesota SCIC 8001 and the summer course held at MIT, Summer 6.50s also taught b y Rob Schreiber for all of their valuable suggestions. These notes will probably evolve i n to a book which will eventually be coauthored by Rob Schreiber and Shang-Hua Teng. Meanwhile, we are fully aware that the 1996 notes are incomplete, contain mathematical and grammatical errors, and do not cover everything we wish. They are an improvement o ver the 1995 notes, but not as good as the 1997 notes will be. I view these notes as a basis on which t o improve, not as a completed book. It has been our experience that some students of pure mathematics and theoretical computer science are a bit fearful of programming real parallel machines. Students of engineering and computer science are sometimes intimidated by mathematics. The most successful students understand that computing is not dirty" and mathematical knowledge is not scary" or useless," but both require hard work and maturity to master. The good news is that there are many jobs both in the industrial and academic sectors for experts in the eld! A good course should have a good theme. We try to emphasize the fundamental algorithmic ideas and machine design principles. We h a ve seen computer vendors come and go, but we believe that the mathematical, algorithmic, and numerical ideas discussed in these notes provide a solid foundation that will last for many y ears.
Science, 1990
Highly parallel computing architectures are the only means to achieve the computational rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines, and current research focuses on which architectures are best suited for particular dasses of problems. The architectures designated as MIMD and SIMD have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed. C OMPUTATION HAS EMERGED AS AN IMPORTANT NEW method in science. It gives access to solutions offundamental problems that pure analysis and pure experiment cannot reach. Aerospace engineers, for example, estimate that a complete numerical simulation of an aircraft in flight could be performed in a matter of hours on a supercomputer capable of sustaining at least 1 trillion floating point operations per second (teraflops, or tflops). Researchers in materials analysis, oil exploration, circuit design, visual recognition, high-energy physics, cosmology, earthquake prediction, atmospherics, oceanography, and other disciplines report that breakthroughs are likely with machines that can compute at a tflops rate.
Concurrency: Practice and Experience, 1989
Parallel supercomputers are now in regular use at Caltech for several major scientific calculations. We use this experience to abstract a set of lessons for applications, decomposition, performance, hardware and software. We consider hypercubes, transputer arrays and the SIMD Connection Machine CM-2 and AMT DAP. These are contrasted, where possible, with CRAY and other high performance conventional computers.
2012 20th Telecommunications Forum (TELFOR), 2012
As information society changes, the digital world is making more use of larger bulks of data and complex operations that need to be executed. This trend has caused overcoming the processor speed limit issues, by introducing multiple processor systems. In spite of hardwarelevel parallelism, the software has evolved with various techniques for achieving parallel programs execution. Executing a program in parallel can be efficiently done only if the program code follows certain rules. There are many techniques, which tend to provide variant processing speeds. The aim of this paper is to test the Matlab, OpenMPI and Pthreads methods on a single-processor, multi-processor, GRID and cluster systems and suggest optimal method for that particular system.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.