Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008, Parallel Computing
AI
This special issue focuses on parallel matrix algorithms, which are essential for numerical simulations involving large linear algebra problems. It features twelve papers stemming from the 4th International Workshop on Parallel Matrix Algorithms and Applications (PMAA06) that cover a variety of topics including matrix solvers, graph ordering, iterative method preconditioners, and applications in molecular dynamics and portfolio management. The contributions provide insights into the development and performance analysis of these algorithms on parallel architectures.
1995
A parallel solvers package of three solvers with a unified user interface is developed for solving a range of sparse symmetric complex linear systems arising from disc~tization of partial differential equations based on unstructu~d meshs using finite element, finite difference and finite volume analysis. Once the data interface is set up, the package constructs the sparse symmetric complex matrix, and solves the linear system by the method chosen by the user, either a preconditioned hi-conjugate gradient solver, or a two-stage Cholesky LDL7' factorization solver, or a hybrid solver combining the above two methods.A unique feature of the solvers package is that the user deals with local matrices on local meshes on each processor. Scaling problem size N with the number of processors P with N/P fixed, test runs on Intel Delta up to 128 processors show that the bkxmjugate gradient method scales linearly with N whereas the two-stage hybrid method scales with TN .
SIAM Journal on Scientific Computing, 2007
This paper addresses the parallelization of the preconditioned iterative methods that use explicit preconditioners such as approximate inverses. Parallelizing a full step of these methods requires the coefficient and preconditioner matrices to be well partitioned. We first show that different methods impose different partitioning requirements for the matrices. Then we develop hypergraph models to meet those requirements. In particular, we develop models that enable us to obtain partitionings on the coefficient and preconditioner matrices simultaneously. Experiments on a set of unsymmetric sparse matrices show that the proposed models yield effective partitioning results. A parallel implementation of the right preconditioned BiCGStab method on a PC cluster verifies that the theoretical gains obtained by the models hold in practice.
Applied Mathematics and Computation, 2002
This paper is concerned with a new approach to preconditioning for large, sparse linear systems. A procedure for computing an incomplete factorization of the inverse of a nonsymmetric matrix is developed, and the resulting factorized sparse approximate inverse is used as an explicit preconditioner for conjugate gradient-type methods. Some theoretical properties of the preconditioner are discussed, and numerical experiments on test matrices from the Harwell-Boeing collection and from Tim Davis's collection are presented. Our results indicate that the new preconditioner is cheaper to construct than other approximate inverse preconditioners. Furthermore, the new technique insures convergence rates of the preconditioned iteration which are comparable with those obtained with standard implicit preconditioners.
2009 Computational Electromagnetics International Workshop, 2009
Abstraet-42onsider the system Ax = b, where A is a large sparse nonsymmetric matrix. It is assumed that A has no sparsity structure that may be exploited in the solution process, its spectrum may lie on both sides of the imaginary axis and its symmetric part may be indefinite. For such systems direct methods may be both time consuming and storage demanding, while iterative methods may not converge. In this paper, a hybrid method, which attempts to avoid these drawbacks, is proposed. An L U factorization of A that depends on a strategy that drops small non-zero elements during the Gaussian elimination process is used as a preconditioner for conjugate gradient-like schemes, ORTHOMIN, GMRES and CGS. Robustness is achieved by altering the drop tolerance and recomputing the preconditioner in the event that the factorization or the iterative method fails. If after a prescribed number of trials the iterative method is still not eonvergent, then a switch is made to a direct solver. Numerical examples, using matrices from the Harwell-Boeing test matrices, show that this hybrid scheme is often less time consuming and storage demanding; than direct solvers, and more robust than iterative methods that depend on preconditioners that depend .an classical positional dropping strategies. I, THE HYBRID ALGORITHM Consider the system of linear algebraic equations Ax = b, where A is a nonsingular, large, sparse and nonsymmetric matrix. We assume also that matrix A is generally sparse (i.e. it has neither any special property, such as symmetry and/or positive definiteness, nor any special pattern, such as bandedness, that can be exploited in the solution of the system). Solving such linear systems may be a rather difficult task. This is so because commonly used direct methods (sparse Gaussian elimination) are too time consuming, and iterative methods whose success depends on the matrix having a definite symmetric part or depends on the spectrum lying on one side of the imaginary axis are not robust enough. Direct methods have the advantage that they normally produce a sufficiently accurate solution, although a direct estimation of the accuracy actually achieved requires additional work. On the other hand, when iterative methods converge sufficiently fast, they require computing time that is several orders of magnitude smaller than that of any direct method. This brief comparison of the main properties of direct methods and iterative methods for the problem at hand shows that the methods of both groups have some advantages and some disadvantages. Ttlerefore it seems worthwhile to design methods that combine the advantages of both groups, while minimizing their disadvantages.
Lecture Notes in Computer Science, 2002
This paper presents an overview of pARMS, a package for solving sparse linear systems on parallel platforms. Preconditioners constitute the most important ingredient in the solution of linear systems arising from realistic scientific and engineering applications. The most common parallel preconditioners used for sparse linear systems adapt domain decomposition concepts to the more general framework of "distributed sparse linear systems". The parallel Algebraic Recursive Multilevel Solver (pARMS) is a recently developed package which integrates together variants from both Schwarz procedures and Schur complementtype techniques. This paper discusses a few of the main ideas and design issues of the package. A few details on the implementation of pARMS are provided.
We present preconditioned solvers to find a few eigenvalues and eigenvectors of large dense or sparse symmetric matrices based on the Jacobi-Davidson (JD) method by G. L. G. Sleijpen and H. A. van der Vorst. For preconditioning, we apply a new adaptive approach using the QMR iteration. To parallelize the solvers, we divide the interesting part of the spectrum into a few overlapping intervals and asynchronously exchange eigenvector approximations from neighboring intervals to keep the solutions separated. Per interval, matrix-vector and vector-vector operations of the JD iteration are parallelized by determining a data distribution and a communication scheme from an automatic analysis of the sparsity pattern of the matrix. We demonstrate the efficiency of these parallelization strategies by timings on an Intel Paragon and a Cray T3E system with matrices from real applications. 1 Introduction We develop preconditioned algorithms for the solution of the following problem on massively p...
Numerical Linear Algebra with Applications, 2000
A preconditioned scheme for solving sparse symmetric eigenproblems is proposed. The solution strategy relies upon the DACG algorithm, which is a Preconditioned Conjugate Gradient algorithm for minimizing the Rayleigh Quotient. A comparison with the well established ARPACK code, shows that when a small number of the leftmost eigenpairs is to be computed, DACG is more efficient than ARPACK. Effective convergence acceleration of DACG is shown to be performed by a suitable approximate inverse preconditioner (AINV). The performance of such a preconditioner is shown to be safe, i.e. not highly dependent on a drop tolerance parameter. On sequential machines, AINV preconditioning proves a practicable alternative to the effective incomplete Cholesky factorization, and is more efficient than Block Jacobi. Due to its parallelizability, the AINV preconditioner is exploited for a parallel implementation of the DACG algorithm. Numerical tests account for the high degree of parallelization attainable on a Cray T3E machine and confirm the satisfactory scalability properties of the algorithm. A final comparison with PARPACK shows the (relative) higher efficiency of AINV-DACG.
This paper addresses the main issues raised during the parallelization of iterative and direct solvers for such systems in distributed memory multiprocessors. If no preconditioning is considered, iterative solvers are simple to parallelize, as the most time-consuming computational structures are matrix-- vector products. Direct methods are much harder to parallelize, as new nonzero values may appear during computation and pivoting operations are usually accomplished due to numerical stability considerations. Suitable data structures and distributions for sparse solvers are discussed within the framework of a data-parallel environment, and experimentally evaluated and compared with existing solutions. 1 Introduction
Numerical Linear Algebra with Applications, 2018
Large sparse linear systems arise in many areas of scientific computing, and the solution of these systems is the most time-consuming part in many large-scale problems. We present a hybrid parallel algorithm, named incomplete WZ parallel solver (IWZPS), for the solution of large sparse nonsingular diagonally dominant linear systems on distributed memory architectures. The method is a combination of both direct and iterative techniques. We compare the present hybrid parallel sparse algorithm IWZPS with the direct and iterative sparse solvers, namely, MUMPS and ILUPACK, respectively. In addition, we compare it with a hybrid parallel solver, DDPS.
SIAM Journal on Computing, 1993
2000
One important mathematical problem in simulation of large electrical circuits is the solution of high-dimensional linear equation systems. The corresponding matrices are real, non-symmetric, very ill-conditioned, have an irregular sparsity pattern, and include a few dense rows and columns.
Acta Numerica, 1993
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We rst discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing e cient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, the singular value decomposition, and generalizations of these to two matrices. We consider dense, band and sparse matrices.
2007
The efficiency of parallel iterative methods for solving linear systems, arising from reallife applications, depends greatly on matrix characteristics and on the amount of parallel overhead. It is often viewed that a major part of this overhead can be caused by parallel matrix-vector multiplications. However, for difficult large linear systems, the preconditioning operations needed to accelerate convergence are to be performed in parallel and may also incur substantial overhead. To obtain an efficient preconditioning, it is desirable to consider certain matrix numerical properties in the matrix partitioning process. In general, graph partitioners consider the nonzero structure of a matrix to balance the number of unknowns and to decrease communication volume among parts. The present work builds upon hypergraph partitioning techniques because of their ability to handle nonsymmetric and irregular structured matrices and because they correctly minimize communication volume. First, several hyperedge weight schemes are proposed to account for the numerical matrix property called diagonal dominance of rows and columns. Then, an algorithm for the independent partitioning of certain submatrices followed by the matching of the obtained parts is presented in detail along with a proof that it correctly minimizes the total communication volume. For the proposed variants of hypergraph partitioning models, numerical experiments compare the iterations to converge, investigate the diagonal dominance of the obtained parts, and show the values of the partitioning cost functions.
Advances in Parallel Computing, 1998
We present parallel preconditioned solvers to nd a few extreme eigenvalues and -vectors of large sparse matrices based on the Jacobi-Davidson (JD) method by G.L.G. Sleijpen and H.A. van der Vorst. For preconditioning, we apply banded matrices and a new adaptive approach using the QMR iteration. To parallelize the solvers developed, we investigate matrix and vector partitioning as well as dividing the spectrum of the matrix into independent parts. The e ciency of these strategies is demonstrated on the massively parallel systems NEC Cenju-3, Cray T3E, and on workstation clusters.
Parallel Computing, 1997
Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method, in particular on massively parallel machines. Here, the data distribution and the communication scheme for the sparse matrix operations of the preconditioned CG are based on the analysis of the indices of the non-zero elements. Polynomial preconditioning is shown to reduce global synchronizations considerably, and a fully local incomplete Cholesky preconditioner is presented. On a PARAGON XP/S 10 with 138 processors, the developed parallel methods outperform diagonally scaled CG markedly with respect to both scaling behavior and execution time for many matrices from real finite element applications.
Parallel Computation, 1999
Anumber of techniques are described for solving sparse linear systems on parallel platforms. The general approach used is a domaindecomposition type method in whichaprocessor is assigned a certain number of rows of the linear system to be solved. Strategies that are discussed include non-standard graph partitioners, and a forced loadbalance technique for the local iterations. A common practice when partitioning a graph is to seek to minimize the number of cut-edges and to have an equal number of equations per processor. It is shown that partitioners that takeinto account the values of the matrix entries may be more e ective.
Future Generation Computer Systems, 2005
For the solution of sparse linear systems from circuit simulation whose coefficient matrices include a few dense rows and columns, a parallel iterative algorithm with distributed Schur complement preconditioning is presented. The parallel efficiency of the solver is increased by transforming the equation system into a problem without dense rows and columns as well as by exploitation of parallel graph partitioning methods. The costs of local, incomplete LU decompositions are decreased by fill-in reducing reordering methods of the matrix and a threshold strategy for the factorization. The efficiency of the parallel solver is demonstrated with real circuit simulation problems on PC clusters.
Parallel Computing, 2011
We investigate the efficient iterative solution of large-scale sparse linear systems on shared-memory multiprocessors. Our parallel approach is based on a multilevel ILU preconditioner which preserves the mathematical semantics of the sequential method in ILUPACK. We exploit the parallelism exposed by the task tree corresponding to the nested dissection hierarchy (task parallelism), employ dynamic scheduling of tasks to processors to improve load balance, and formulate all stages of the parallel PCG method conformal with the computation of the preconditioner to increase data reuse. Results on a CC-NUMA platform with 16 processors reveal the parallel efficiency of this solution.
Journal of Computational Physics, 2003
The Jacobi-Davidson (JD) algorithm was recently proposed for evaluating a number of the eigenvalues of a matrix. JD goes beyond pure Krylov-space techniques; it cleverly expands its search space, by solving the so-called correction equation, thus in principle providing a more powerful method. Preconditioning the Jacobi-Davidson correction equation is mandatory when large, sparse matrices are analyzed. We considered several preconditioners: Classical block-Jacobi, and IC(0), together with approximate inverse (AINV or FSAI) preconditioners. The rationale for using approximate inverse preconditioners is their high parallelization potential, combined with their efficiency in accelerating the iterative solution of the correction equation. Analysis was carried on the sequential performance of preconditioned JD for the spectral decomposition of large, sparse matrices, which originate in the numerical integration of partial differential equations arising in physical and engineering problems. It was found that JD is highly sensitive to preconditioning, and it can display an irregular convergence behavior. We parallelized JD by data-splitting techniques, combining them with techniques to reduce the amount of communication data. Our own parallel, preconditioned code was executed on a dedicated parallel machine, and we present the results of our experiments. Our JD code provides an appreciable parallel degree of computation. Its performance was also compared with those of PARPACK and parallel DACG.
2013
Our goal is to show on several examples the great progress made in numerical analysis in the past decades together with the principal problems and relations to other disciplines. We restrict ourselves to numerical linear algebra, or, more specifically, to solving Ax = b where A is a real nonsingular n by n matrix and b a real n−dimensional vector, and to computing eigenvalues of a sparse matrix A. We discuss recent developments in both sparse direct and iterative solvers, as well as fundamental problems in computing eigenvalues. The effects of parallel architectures to the choice of the method and to the implementation of codes are stressed throughout the contribution.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.