Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1990, Computers & Mathematics with Applications
We review some of the most important results in the area of fast parallel algorithms for the solution of linear systems and related problems, such as matrix inversion, computation of the determinant and of the adjoint matrix. We analyze both direct and iterative methods implemented in various models of parallel computation.
Computers & Mathematics with Applications, 1989
Algorithmica, 2001
In the last two decades several NC algorithms for solving basic linear algebraic problems have appeared in the literature. This interest was clearly motivated by the emergence of a parallel computing technology and by the wide applicability of matrix computations. The traditionally adopted computation model, however, ignores the arithmetic aspects of the applications, and no analysis is currently available demonstrating the concrete feasibility of many of the known fast methods. In this paper we give strong evidence to the contrary, on the sole basis of the issue of robustness, indicating that some theoretically brilliant solutions fail the severe test of the "Engineering of Algorithms." We perform a comparative analysis of several well-known numerical matrix inversion algorithms under both fixed-and variable-precision models of arithmetic. We show that, for most methods investigated, a typical input leads to poor numerical performance, and that in the exact-arithmetic setting no benefit derives from conditions usually deemed favorable in standard scientific computing. Under these circumstances, the only algorithm admitting sufficiently accurate NC implementations is Newton's iterative method, and the word size required to guarantee worst-case correctness appears to be the critical complexity measure. Our analysis also accounts for the observed instability of the considered superfast methods when implemented with the same floating-point arithmetic that is perfectly adequate for the fixed-precision approach.
Advances in Engineering Software, 1999
A simple direct method and its variants, based on matrix multiplications, to invert square non-singular matrices (need not be positive definite) and to solve systems of linear equations are developed. Since the method and its variants involve only matrix multiplications they are straightforward to parallelise and hence have an advantage over some existing well known direct methods. Theoretical background, analysis of the proposed method and the complexity of the algorithm for sparse and dense matrices are given. Two significantly different parallel algorithms for dense and sparse matrices are developed. In the case of the dense matrix standard parallelisation of matrix multiplications was undertaken. However, for sparse matrices the standard parallel matrix multiplication is not effective which led us to develop a parallel algorithm different from that for the dense matrix. The performances of our algorithms are tested via numerical examples. ᭧
Acta Numerica, 1993
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We rst discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing e cient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, the singular value decomposition, and generalizations of these to two matrices. We consider dense, band and sparse matrices.
Lecture Notes in Computer Science, 1994
This paper proposes a general execution scheme for parallelizing a class of iterative algorithms characterized by strong data dependencies between iterations. This class includes non-simultaneous iterative methods for solving systems of linear equations, such as Gauss-Seidel and SOR, and long-range methods. The paper presents a set of code transformations that make it possible to derive the parallel form of the algorithm starting from sequential code. The performance of the proposed execution scheme are then analyzed with respect to an abstract model of the underlying parallel machine. ? We wish to thank P. Sguazzero for his helpful hints and suggestions, and IBM ECSEC for having made available to us the SP1 machine on which experimental measures have been performed. This work has been supported by Consiglio Nazionale delle Ricerche under funds of \Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo" and by MURST under funds 40%.
Computers & Mathematics with Applications, 1995
In this paper, a variant of Gaussian Elimination (GE) called Successive Gaussian Elimination (SGE) algorithm for parallel solution of linear equations is presented. Unlike the conventional GE algorithm, the SGE algorithm does not have a separate back substitution phase, which requires O(N) steps using O(N) processors or O (log 2 N) steps using O (N 3) processors, for solving a system of linear algebraic equations. It replaces the back substitution phase by only one step division and possesses numerical stability through partial pivoting. Further, in this paper, the SGE algorithm is shown to produce the diagonal form in the same amount of parallel time required for producing triangular form using the conventional parallel GE algorithm. Finally, the effectiveness of the SGE algorithm is demonstrated by studying its performance on a hypercube multiprocessor system.
Journal of the ACM, 1978
In this paper three stable parallel algorithms for solving dense and tndlagonai systems of lmear equations are discussed The algorithms are based on Givens' reduction of a matrix to the upper triangular form The algorithm for the dense case requires O(n) time steps compared to O(n log n) steps for Gausslan ehmmatlon with pivoting (in the absence of certain features of machine logic and hardware) For the trldlagonal case, one of the algorithms presented here is superior to the best previous algorithm in that with a modest increase in time It does not fall if any of the leading pnnclpal submatrlces is singular, the probablhty of over-or underflow is minimized, and the error bound does not grow exponentially Furthermore, it is most statable when only a hmtted number of processors ts available.
2008
In this paper we apply different strategies to parallelize a structure-preserving doubling method for the rational matrix equation X = Q + LX −1 L T . Several levels of parallelism are exploited to enhance performance, and standard sequential and parallel linear algebra libraries are used to obtain portable and efficient implementations of the algorithms. Experimental results on a shared-memory multiprocessor show that a coarse-grain approach which combines two MPI processes with a multithreaded implementation of BLAS in general yields the highest performance.
Advances in Parallel and Vector Processing for Structural Mechanics, 1994
We are interested in iterative algorithms that lend themselves to high-level parallel computation. One example is the solution of very large and sparse linear systems with multiple right-hand sides. We report on some recent work on this topic and present new results on the parallel solution of this problem. We show that algorithms that perform some amount of information exchange while the systems are being solved can be very competitive compared to algorithms that proceed independently.
Theoretical Computer Science, 1997
The complexity of performing matrix computations, such as solving a linear system, inverting a nonsingular matrix or computing its rank, has received a lot of attention by both the theory and the scientific computing communities. In this paper we address some "nonclassical" matrix problems that find extensive applications, notably in control theory. More precisely, we study the matrix equations Ax +XAT = C and Ax -XB = C, the "inverse" of the eigenvalue problem (called pole assignment), and the problem of testing whether the matrix [B AB
This paper describes techniques to compute matrix inverses by means of algorithms that are highly suited to massively parallel com+tation. In contrast, conventional techniques such as pivoted Gaussian elimination and LU decomposition are efficient only on vector computers or fairly low-level parallel systems. These techniques are based on an algorithm suggested by Strassen in, 1969. Variations of this scheme employ matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55% faster than a conventional library routine to one that, while slower than a library routine, achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that shown that this problem can also be solved efficiently using these techniques.
Parallel Algorithms and Applications
This paper considers the Jordan and Huard diagonalization methods for solving linear systems on an MIMD computer. We introduce two parallel algorithms for this class of methods and study their complexity taking into consideration the communication cost. Next, in an attempt to reduce the communication load we introduce their corresponding block versions. Finally, we derive new complexity results and compare their asymptotic performances.
IEEE Transactions on Computers, 1980
An MIMD-type parallel-processing system was introduced recently. Using an extended PL/I notation, algorithms for the Gauss-Seidel method and for back substitution are written for this system. Block execution is shown to result in higher speedups.
We present the first parallel algorithm for solving systems of linear equations in symmetric, diagonally dominant (SDD) matrices that runs in polylogarithmic time and nearly-linear work. The heart of our algorithm is a construction of a sparse approximate inverse chain for the input matrix: a sequence of sparse matrices whose product approximates its inverse. Whereas other fast algorithms for solving systems of equations in SDD matrices exploit low-stretch spanning trees, our algorithm only requires spectral graph sparsifiers.
IEEE Transactions on Computers, 1977
A parallel processor system and its mode of operation are described. A notation for writing programs on it is introduced. Methods for iterative solution of a set of linear equations are then discussed. The well-known algorithms of Jacobi and Gauss-Seidel are parallelized despite the apparent inherent sequentiality of the latter. New, parallel methods for the iterative solution of linear equations are introduced and their convergence is discussed. A measure of speedup is computed for all methods. It shows that in most cases the algorithms developed in the paper may be efficiently executed on a parallel processor system.
1995
A few parallel algorithms for solving triangular systems resulting from parallel factorization of sparse linear systems have been proposed and implemented recently. We present a detailed analysis of parallel complexity and scalability of the best of these algorithms and the results of its implementation on up to 256 processors of the Cray T3D parallel computer. It has been a common belief that parallel sparse triangular solvers are quite unscalable due to a high communication to computation ratio. Our analysis and experiments show that, although not as scalable as the best parallel sparse Cholesky factorization algorithms, parallel sparse triangular solvers can yield reasonable speedups in runtime on hundreds of processors. We also show that for a wide class of problems, the sparse triangular solvers described in this paper are optimal and are asymptotically as scalable as a dense triangular solver.
SIAM Journal on Computing, 1993
Proceedings of the 12th International Workshop on Programming Models and Applications for Multicores and Manycores, 2021
We take advantage of the new tasking features in OpenMP to propose advanced task-parallel algorithms for the inversion of dense matrices via Gauss-Jordan elimination. Our algorithms perform a partitioning of the matrix operand into two levels of tasks: The matrix is first divided vertically, by column blocks (or panels), in order to accommodate the standard partial pivoting scheme that ensures the numerical stability of the method. In addition, depending on the particular kernel to be applied, each panel is partitioned either horizontally by row blocks (tiles) or vertically by µ-panels (of columns), in order to extract sufficient task parallelism to feed a many-threaded general purpose processor (CPU). The results of the experimental evaluation show the performance benefits of the advanced tasking algorithms on an Intel Xeon Gold processor with 20 cores.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.