Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1980, IEEE Symposium on Foundations of Computer Science
AI
This paper investigates the intersection of hardware complexity and parallel computation, proposing new models for parallel hardware that bridge previously established computational resources. Through these models, a detailed relationship between hardware complexity classes and known parallel resource classes is established, challenging traditional views on the efficiency of simulations involving parallel time and hardware requirements. The authors also introduce characterizations of important classes such as NC and SC, addressing long-standing open questions in the field.
21st Annual Symposium on Foundations of Computer Science (sfcs 1980), 1980
2010
This thesis reviews selected topics from the theory of parallel computation. The research begins with a survey of the proposed models of parallel computation. It examines the characteristics of each model and it discusses its use either for theoretical studies, or for practical applications. Subsequently, it employs common simulation techniques to evaluate the computational power of these models. The simulations establish certain model relations before advancing to a detailed study of the parallel complexity theory, which is the subject of the second part of this thesis. The second part examines classes of feasible highly parallel problems and it investigates the limits of parallelization. It is concerned with the benefits of the parallel solutions and the extent to which they can be applied to all problems. It analyzes the parallel complexity of various well-known tractable problems and it discusses the automatic parallelization of the efficient sequential algorithms. Moreover, it ...
Theoretical Computer Science, 1990
ALnrr& This paper outlines a theory of parallel algorithms that emphasizes two crucial aspects of parallel computation: speedup the improvement in running time due to parallelism. and cficienc,t; the ratio of work done by a parallel algorithm to the work done hv a sequential alponthm. We define six classes of algonthms in these terms: of particular Interest is the &cc. EP, of algorithms that achieve a polynomiai spredup with constant efficiency. The relations hr:ween these classes are examined. WC investigate the robustness of these classes across various models of parallel computation. To do so. w'e examine simulations across models where the simulating machine may be smaller than the simulated machine. These simulations are analyzed with respect to their efficiency and to the reducbon in the number of processors. We show that a large number of parallel computation models are related via efficient simulations. if a polynomial reduction of the number of processors is allowed. This implies that the class EP is invariant across all these models. Many open pmblemc motivated by our app oath are listed. I. IwNtdoetiom As parallel computers become increasingly available, a theory of para!lel algorithms is needed to guide the design of algorithms for such machines. To be useful, such a theory must address two major concerns in parallel computation, namely speedup and efficiency. It should classify algorithms and problems into a few, meaningful classes that are, to the largest exient possible, model independent. This paper outlines an approach to the analysis of parallel algorithms that we feel answers these concerns without sacrificing tc:, much generality or abstractness. We propose a classification of parallel algorithms in terms of parallel running time and inefficiency, which is the extra amount of work done by a parallel algorithm es compared to a sequential algorithm. Both running time and inefficiency are measured as a function of the sequential running time, which is used as a yardstick * A preliminary version of this paper was presented at 15th International Colloquium on Automata,
1978
We investigate a certain model of synchronous parallelism. Syntax, semantics and complexity of programs within it are defined. We consider algorithmic properties of synchronous parallel programs in connection with sequential programs with arrays. The complexity theorem states that the class PP-time (pOlrnomial-time bounded parallel languages) is equal to P-space (languages requiring polynomial amount of memory). -675-cobegin (I1DP 1 ), .•. , (IrDP r ) coend where: 1. p. is the relation programmable in R and it is J writen in the form Ka for some program KeFS R and an open formula a, for j=l, ... ,r. (cf. [1]). 2. I. for j=l, ... ,r is a sequential program from J FS R • -679-3. For all j=l, ..• ,r the set of free index variables in I. and p. is the same. J J 4. For all j=l, ... ,r any index variable in I. can J not occur as a left side of substitution. (This restriction is implied by semantics, because p. J will assign those variables on which program I. 1 ) J J Denote: T. = {(nl, ... ,nk.):PJ.(nl, ... ,nk.)(v) = l} for J J J The set T. is the set of all sequence In the last two chapters of the paper we have to make some restrictions on the form of the programmable relation p .• If we allow p. to be an arbitrary formula J J K~, then even the problem of finiteness p. could become J undecidable. In order to obtain effectiveness and complexity theorem we assume that p. is given by a system J of linear inequalities (with respect to index variables), -685-J ces satisfying p .• If T. is not finite (in the case J J of system of linear inequalities the problem is decidable) then stop without result. 2. Following the semantics definition check if there exists unavoidable variable conflict (point ii)). If yes then stop with undefined result.
1994
A complexity model based on the λ-calculus with an appropriate operational semantics in presented and related to various parallel machine models, including the PRAM and hypercube models. The model is used to study parallel algorithms in the context of "sequential" functional languages, and to relate these results to algorithms designed directly for parallel machine models. For example, the paper shows that equally good upper bounds can be achieved for merging two sorted sequences in the pure λ-calculus with some arithmetic constants as in the EREW PRAM, when they are both mapped onto a more realistic machine such as a hypercube or butterfly network. In particular for n keys and p processors, they both result in an O(n/p + log 2 p) time algorithm. These results argue that it is possible to get good parallelism in functional languages without adding explicitly parallel constructs. In fact, the lack of random access seems to be a bigger problem than the lack of parallelism.
Symposium on the Theory of Computing, 1982
1979
A simple model of concurrent computations is presented in which disjoint instructions /processes/ of program are executed concurrently by processors /in a sufficiently large number/ under a shared memory environment. The semantics of such a program specifies the tree of configuration sequences which are acceptable as possible computations of the program. We do not agree with the existing literature /e.g. [2]/ that every sharing one processor among processes can be conceived as a concurrency. We claim that the other meaning of concurrency can be defined as well. The difference between these two meanings turns out to be essential. We do not assume that each configuration is obtained from its predecessor in the computation by exactly one processor performing an atomic step /assignment or test/ in a process. On the contrary, we assume that a processor cannot be delayed during his activities. The length of a step is indefinite, it must be finite only. This reflects various speeds of processors. Hence, for the configuration in which several processors are able to start the execution of their subsequent steps, a maximal number of atomic steps will be started, the choice being nondeterministic. We discuss semantical phenomena of concurrent computations. It is argued that they can be expressed in the language of an algorithmic logic. The problem of complete axiomatization of the latter remains open. The comparison with another model of concurrency — Petri nets — is given and, we hope, it is interesting. For, our approach offers a structured /algebraic/ restriction of the language of nets and new variants of semantics. From the results obtained in the theory of vector addition systems we learn an important property of concurrent computations — there is no faithful one processor simulation of them.
1991
In recent years, powerful theoretical techniques have been developed for supporting communication, synchronization and fault tolerance in general purpose parallel computing. The proposition of this thesis is that different techniques should be used to support different algorithms. The determining factor is granularity, or the extent to which an algorithm uses long blocks for communication between processors. We consider the Block PRAM model of Aggarwal, Chandra and Snir, a synchronous model of parallel computation in which the processors communicate by accessing a shared memory. In the Block PRAM model, there is a time cost for each access by a processor to a block of locations in the shared memory. This feature of the model encourages the use of long blocks for communication. In the thesis we present Block PRAM algorithms and lower bounds for specific problems on arrays, lists, expression trees, graphs, strings, binary trees and butterflies. These results introduce useful basic techniques for parallel computation in practice, and provide a classification of problems and algorithms according to their granularity. Also presented are optimal algorithms for universal hashing and skewing, which are techniques for supporting conflict-free memory access in general-and special-purpose parallel computations, respectively. We explore the Block PRAM model as a theoretical basis for the design of scalable general purpose parallel computers. Several simulation results are presented which show the Block PRAM model to be comparable to, and competitive with, other models that have been proposed for this role. Two major advantages of machines based on the Block PRAM model is that they are able to preserve the granularity properties of individual algorithms and can efficiently incorporate a significant degree of fault tolerance. The thesis also discusses methods for the design of algorithms that do not use synchronization. We apply these methods to define fast circuits for several fundamental Boolean functions.
Information Processing Letters, 1977
Mathematical Foundations of Computer Science 1992, 1992
Weak parallel machines represent a new class of physically feasible parallel machine models whose prominent representative is the so-called Parallel Turing Machine (PTM) as introduced by the author in 1984. Except PTMs, further members of this class are e.g. various kinds of systolic machines, cellular automata, orthogonal iterative arrays, etc. From the computational point of view the main common feature of weak parallel machines is their ability to perform pipelined computations efficiently, what is used in characterizing the corresponding machine class by so-caJled Pipelined Computation Thesis. This thesis states that on these machines the period of computation is polynomiaJly related to the space of sequential Turing machine computations. The paper gives a brief overview of the most important known results concerning PTMs and extends them by new results stressing the significance of PTMs in the context of physically feasible parallel computations.
Theoretical Computer Science, 2003
This paper contains answers to several problems in the theory of the computational complexity of inÿnite words. We show that the problem whether all inÿnite words generated by iterating deterministic generalized sequential machines have logarithmic space complexity is equivalent to the open problem asking whether the unary classes of languages in P and in DLOG are equivalent. Similarly, the problem to ÿnd a concrete inÿnite word which cannot be generated in logarithmic space is equivalent to the problem to ÿnd a concrete language which does not belong to DSPACE(n). Finally, we separate classes of inÿnite words generated by double and triple D0L TAG systems.
Journal of Parallel and Distributed Computing, 1991
The many revolutionary changes brought about by the integrated chip-in the form of significant improvements in processing, storage, and communications-have also brought about a host of related problems for designers and users of parallel and distributed systems. These systems develop and proliferate at an amazing momentum, motivating research in the understanding and testing of complex distributed systems. Unfortunately, these relatively expensive systems are being designed, built, used, refined, and rebuilt (at perhaps an avoidable expense) even before we have developed methodology for understanding the underlying principles of their behavior. Though it is not realistic to expect that the current rate of manufacturing can be slowed down to accommodate research in design principles, it behooves us to bring attention to the importance of design methodology and performance understanding of such systems and, in this way, to attempt to influence parallel system design in a positive manner. At the present time, there is considerable debate among various schools of thought on parallel machine architectures, with different schools proposing different architectures and design philosophies. Consider, for example, one such debate involving tightly coupled systems. Early on, Minsky [l] conjectured a somewhat pessimistic bound of log y1 for typical speedup on n processors. Since then, researchers [2 ] have shown that certain characteristics of programs, such as the DO loops in Fortran, can often be exploited to yield more optimistic levels of speedup. Other researchers [ 31 counter this kind of optimism by pointing out that parallel and vector processing has limitations in potential speedup (i.e., Amdahl's law) to the extent that speedup is bounded from above by n/ (s. n + 1-s), where s is the fraction of a computation that must be done serially. This suggests that it makes more sense to first concentrate on achieving the maximum speedup possible with a single powerful processor. In this view, the distributed approach is not as attractive an option. More recently, work on hypercubes [4] appears to indicate that
… Proceedings of the Conference Held at …, 1986
Journal of Parallel and Distributed Computing, 1994
We discuss issues pertinent to performance analysis of massively parallel systems. We first argue that single parameter characterization of parallel software or of parallel hardware rarely provides insight into the complex interactions among the soft,ware and the hard ware components of a parallel system. In particular, bounds for the speed up based upon simple models of parallelism are violated when a model ignores the effects of communication.
New Generation Computing, 1999
We formalize the implementation mechanisms required to support or-parallel execution of logic programs in terms of operations on dynamic data structures. Upper and lower bounds are derived, in terms of the number of operations n performed on the data structure, for the problem of guaranteeing correct semantics during or-parallel execution.
Foundations of Software Technology and …, 1997
We study several data-structures and operations that commonly arise in parallel implementations of logic programming languages. The main problems that arise in implementing such parallel systems are abstracted out and precisely stated. Upper and ...
Theoretical Computer Science, 1989
We present efficient time-bounded ATM and space-bounded TM simulations of one-way conglomerates (OWCs), which are interconnesrcti networks of finite-state machines tnat allow only one-way communication between adjacent nodes. In particular, we show that OWCs with depth D(n) and operating in time T(n) can be simulated by ATMs in time 0(D(n). log T(n)) (and hence by a TM with the same amount of space). This extends Ruzzo's result that boolean circuits of depth D(n) can be simulated by 0(D(n))-tims bounded ATMs, and refines Goldschlager's result that two-way conglomerates operating in T(o) time can be simulated by T(n)space bounded TMs. By exploiting the regularity of interconnections in some OWCs, we obtain more efficient space-bounded TM simulations. For example, using the ATM result, a kdimensional one-way mesh array of n'-nodes would require n* space on a TM (such an array can run in c" time in the worst case). WC show that the space can be reduced to n'-aih.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.