Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
This paper addresses the main issues raised during the parallelization of iterative and direct solvers for such systems in distributed memory multiprocessors. If no preconditioning is considered, iterative solvers are simple to parallelize, as the most time-consuming computational structures are matrix-- vector products. Direct methods are much harder to parallelize, as new nonzero values may appear during computation and pivoting operations are usually accomplished due to numerical stability considerations. Suitable data structures and distributions for sparse solvers are discussed within the framework of a data-parallel environment, and experimentally evaluated and compared with existing solutions. 1 Introduction
2009 Computational Electromagnetics International Workshop, 2009
Abstraet-42onsider the system Ax = b, where A is a large sparse nonsymmetric matrix. It is assumed that A has no sparsity structure that may be exploited in the solution process, its spectrum may lie on both sides of the imaginary axis and its symmetric part may be indefinite. For such systems direct methods may be both time consuming and storage demanding, while iterative methods may not converge. In this paper, a hybrid method, which attempts to avoid these drawbacks, is proposed. An L U factorization of A that depends on a strategy that drops small non-zero elements during the Gaussian elimination process is used as a preconditioner for conjugate gradient-like schemes, ORTHOMIN, GMRES and CGS. Robustness is achieved by altering the drop tolerance and recomputing the preconditioner in the event that the factorization or the iterative method fails. If after a prescribed number of trials the iterative method is still not eonvergent, then a switch is made to a direct solver. Numerical examples, using matrices from the Harwell-Boeing test matrices, show that this hybrid scheme is often less time consuming and storage demanding; than direct solvers, and more robust than iterative methods that depend on preconditioners that depend .an classical positional dropping strategies. I, THE HYBRID ALGORITHM Consider the system of linear algebraic equations Ax = b, where A is a nonsingular, large, sparse and nonsymmetric matrix. We assume also that matrix A is generally sparse (i.e. it has neither any special property, such as symmetry and/or positive definiteness, nor any special pattern, such as bandedness, that can be exploited in the solution of the system). Solving such linear systems may be a rather difficult task. This is so because commonly used direct methods (sparse Gaussian elimination) are too time consuming, and iterative methods whose success depends on the matrix having a definite symmetric part or depends on the spectrum lying on one side of the imaginary axis are not robust enough. Direct methods have the advantage that they normally produce a sufficiently accurate solution, although a direct estimation of the accuracy actually achieved requires additional work. On the other hand, when iterative methods converge sufficiently fast, they require computing time that is several orders of magnitude smaller than that of any direct method. This brief comparison of the main properties of direct methods and iterative methods for the problem at hand shows that the methods of both groups have some advantages and some disadvantages. Ttlerefore it seems worthwhile to design methods that combine the advantages of both groups, while minimizing their disadvantages.
ACM Transactions on Mathematical Software, 2001
This paper provides a comprehensive study and comparison of two state-of-the-art direct solvers for large sparse sets of linear equations on large-scale distributed-memory computers. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. We describe the main algorithmic features of the two solvers and compare their performance characteristics with respect to uniprocessor speed, interprocessor communication, and memory requirements. For both solvers, preorderings for numerical stability and sparsity play an important role in achieving high parallel efficiency. We analyse the results with various ordering algorithms. Our performance analysis is based on data obtained from runs on a 512-processor Cray T3E using a set of matrices from real applications. We also use regular 3D grid problems to study the scalability of the two solvers.
1995
A parallel solvers package of three solvers with a unified user interface is developed for solving a range of sparse symmetric complex linear systems arising from disc~tization of partial differential equations based on unstructu~d meshs using finite element, finite difference and finite volume analysis. Once the data interface is set up, the package constructs the sparse symmetric complex matrix, and solves the linear system by the method chosen by the user, either a preconditioned hi-conjugate gradient solver, or a two-stage Cholesky LDL7' factorization solver, or a hybrid solver combining the above two methods.A unique feature of the solvers package is that the user deals with local matrices on local meshes on each processor. Scaling problem size N with the number of processors P with N/P fixed, test runs on Intel Delta up to 128 processors show that the bkxmjugate gradient method scales linearly with N whereas the two-stage hybrid method scales with TN .
Applicable Algebra in Engineering, Communication and Computing, 2007
An important recent development in the area of solution of general sparse systems of linear equations has been the introduction of new algorithms that allow complete decoupling of symbolic and numerical phases of sparse Gaussian elimination with partial pivoting. This enables efficient solution of a series of sparse systems with the same nonzero pattern but different coefficient values, which is a fairly common situation in practical applications. This paper reports on a shared-and distributed-memory parallel general sparse solver based on these new symbolic and unsymmetric-pattern multifrontal algorithms.
Lecture Notes in Computer Science, 2002
This paper presents an overview of pARMS, a package for solving sparse linear systems on parallel platforms. Preconditioners constitute the most important ingredient in the solution of linear systems arising from realistic scientific and engineering applications. The most common parallel preconditioners used for sparse linear systems adapt domain decomposition concepts to the more general framework of "distributed sparse linear systems". The parallel Algebraic Recursive Multilevel Solver (pARMS) is a recently developed package which integrates together variants from both Schwarz procedures and Schur complementtype techniques. This paper discusses a few of the main ideas and design issues of the package. A few details on the implementation of pARMS are provided.
Journal of Information Science and Engineering, 2002
The objective of this work is the analysis and prediction of the performance of irregular codes, mainly in their parallel implementations. In particular, this paper focuses on parallel iterative solvers for sparse matrices as a relevant case of study of this kind of codes. An efficient library of solvers and preconditioners was developed using HPF and MPI as parallel platforms. For this library, models to characterize and predict the behavior of the execution of the methods, preconditioners and kernels were introduced. To show the results of these models, a visualization tool with an easy to use GUI interface was implemented. Finally, results of the prediction models for the codes of the parallel library are presented using the visualization tool.
1995
The individuals and institutions listed on the cover of this document disclaim all warranties with regard to the PCG software package, including all warranties of merchantability and tness, and any stated express warranties are in lieu of all obligations or liability on the part of these individuals and institutions for damages, including, but not limited to, special, indirect, or consequential damages arising out of or in connection with the use or performance of PCG. In no event will these individuals or institutions be liable for any direct, indirect, special, incidental, or consequential damages arising in connection with use of or inability to use PCG or the documentation. The current version of PCG is preliminary, and the package is under development. Expansion of the package is in progress. Reports of di culties encountered in using the system or comments and suggestions for improving the package are welcome.
Parallel Computing, 2011
We investigate the efficient iterative solution of large-scale sparse linear systems on shared-memory multiprocessors. Our parallel approach is based on a multilevel ILU preconditioner which preserves the mathematical semantics of the sequential method in ILUPACK. We exploit the parallelism exposed by the task tree corresponding to the nested dissection hierarchy (task parallelism), employ dynamic scheduling of tasks to processors to
nla.2469, 2022
General sparse hybrid solvers are commonly used kernels for solving wide range of scientific and engineering problems. This work addresses the current problems of efficiently solving general sparse linear equations with direct/iterative hybrid solvers on many core distributed clusters. We briefly discuss the solution stages of Maphys, HIPS, and PDSLin hybrid solvers for large sparse linear systems with their major algorithmic differences. In this category of solvers, different methods with sophisticated preconditioning algorithms are suggested to solve the trade off between memory and convergence. Such solutions require a certain hierarchical level of parallelism more suitable for modern supercomputers that allow to scale for thousand numbers of processors using Schur complement framework. We study the effect of reordering and analyze the performance, scalability as well as memory for each solve phase of PDSLin, Maphys, and HIPS hybrid solvers using large set of challenging matrices arising from different actual applications and compare the results with SuperLU_DIST direct solver. We specifically focus on the level of parallel mechanisms used by the hybrid solvers and the effect on scalability. Tuning and Analysis Utilities (TAU) is employed to assess the efficient usage of heap memory profile and measuring communication volume. The tests are run on high performance large memory clusters using up to 512 processors.
IEEE Transactions on Parallel and Distributed Systems, 2015
In parallel linear iterative solvers, sparse matrix vector multiplication (SpMxV) incurs irregular point-to-point (P2P) communications, whereas inner product computations incur regular collective communications. These P2P communications cause an additional synchronization point with relatively high message latency costs due to small message sizes. In these solvers, each SpMxV is usually followed by an inner product computation that involves the output vector of SpMxV. Here, we exploit this property to propose a novel parallelization method that avoids the latency costs and synchronization overhead of P2P communications. Our method involves a computational and a communication rearrangement scheme. The computational rearrangement provides an alternative method for forming input vector of SpMxV and allows P2P and collective communications to be performed in a single phase. The communication rearrangement realizes this opportunity by embedding P2P communications into global collective communication operations. The proposed method grants a certain value on the maximum number of messages communicated regardless of the sparsity pattern of the matrix. The downside, however, is the increased message volume and the negligible redundant computation. We favor reducing the message latency costs at the expense of increasing message volume. Yet, we propose two iterative-improvementbased heuristics to alleviate the increase in the volume through one-to-one task-to-processor mapping. Our experiments on two supercomputers, Cray XE6 and IBM BlueGene/Q, up to 2,048 processors show that the proposed parallelization method exhibits superior scalable performance compared to the conventional parallelization method.
Computers & Mathematics with Applications, 1995
Coarse grain parallel codes for solving sparse systems of linear algebraic equations can be developed in several di erent ways. The following procedure is suitable for some parallel computers. A preliminary reordering of the matrix is rst applied to move as many zero elements as possible to the lower left corner. After that the matrix is partitioned into large blocks and the blocks in the lower left corner contain only zero elements. An attempt to obtain a good load-balance is carried out by allowing the diagonal blocks to be rectangular.
Lecture Notes in Computer Science, 2009
The availability of large-scale computing platforms comprised of tens of thousands of multicore processors motivates the need for the next generation of highly scalable sparse linear system solvers. These solvers must optimize parallel performance, processor (serial) performance, as well as memory requirements, while being robust across broad classes of applications and systems. In this paper, we present a new parallel solver that combines the desirable characteristics of direct methods (robustness) and effective iterative solvers (low computational cost), while alleviating their drawbacks (memory requirements, lack of robustness). Our proposed hybrid solver is based on the general sparse solver PARDISO, and the "Spike" family of hybrid solvers. The resulting algorithm, called PSPIKE, is as robust as direct solvers, more reliable than classical preconditioned Krylov subspace methods, and much more scalable than direct sparse solvers. We support our performance and parallel scalability claims using detailed experimental studies and comparison with direct solvers, as well as classical preconditioned Krylov methods.
Algorithms, 2013
At the heart of many computations in science and engineering lies the need to efficiently and accurately solve large sparse linear systems of equations. Direct methods are frequently the method of choice because of their robustness, accuracy and potential for use as black-box solvers. In the last few years, there have been many new developments, and a number of new modern parallel general-purpose sparse solvers have been written for inclusion within the HSL mathematical software library. In this paper, we introduce and briefly review these solvers for symmetric sparse systems. We describe the algorithms used, highlight key features (including bit-compatibility and out-of-core working) and then, using problems arising from a range of practical applications, we illustrate and compare their performances. We demonstrate that modern direct solvers are able to accurately solve systems of order 10 6 in less than 3 minutes on a 16-core machine.
19, ABSTRACT (Continue on reverse if necessary and identify by block number) The purpose of this research was to study robust iterative methods for solving sparse (block tridiagonal) nonsymmetric linear systems in a parallel computing environment. A new method was developed which uses block-row symmetric successive overrelaxation (SSOR) with conjugate gradient (CG; acceleration. The method is reobust, with convergence assured even for poorly conditioned systems, and the method is easily implemented in a parallel environment. The method transforms a nonsymmetric system with an arbitrary eigenvalue distribution into a symmetric one with eigenvalue restricted to the interval (0,1). Research included testing of the algorithms on an Alliant FX/8 multiprocessor where it was demonstrated that the methods is very robust and performs better than standard existing methods. .
Numerical Linear Algebra with Applications, 2018
Large sparse linear systems arise in many areas of scientific computing, and the solution of these systems is the most time-consuming part in many large-scale problems. We present a hybrid parallel algorithm, named incomplete WZ parallel solver (IWZPS), for the solution of large sparse nonsingular diagonally dominant linear systems on distributed memory architectures. The method is a combination of both direct and iterative techniques. We compare the present hybrid parallel sparse algorithm IWZPS with the direct and iterative sparse solvers, namely, MUMPS and ILUPACK, respectively. In addition, we compare it with a hybrid parallel solver, DDPS.
Future Generation Computer Systems, 2005
For the solution of sparse linear systems from circuit simulation whose coefficient matrices include a few dense rows and columns, a parallel iterative algorithm with distributed Schur complement preconditioning is presented. The parallel efficiency of the solver is increased by transforming the equation system into a problem without dense rows and columns as well as by exploitation of parallel graph partitioning methods. The costs of local, incomplete LU decompositions are decreased by fill-in reducing reordering methods of the matrix and a threshold strategy for the factorization. The efficiency of the parallel solver is demonstrated with real circuit simulation problems on PC clusters.
The Journal of Supercomputing, 2004
In this paper, an exhaustive parallel library of sparse iterative methods and preconditioners in HPF and MPI was developed, and a model for predicting the performance of these codes is presented. This model can be used both by users and by library developers to optimize the efficiency of the codes, as well as to simplify their use. The information offered by this model combines theoretical features of the methods and preconditioners in addition to certain practical considerations and predictions about aspects of the performance of their execution in distributed memory multiprocessors.
Report, University of Minnesota and IBM …, 2001
LIMITED DISTRIBUTION NOTICE This report has been submitted for publication outside of IBM and will probably be copyrighted if ac-cepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the ...
Parallel Numerical Computation with Applications, 1999
Many problems in engineering and scienti c domains require solving large sparse systems of linear equations, as a computationally intensive step towards the nal solution. It has long been a challenge to develop e cient parallel formulations of sparse direct solvers due to several di erent complex steps involved in the process. In this paper, we describe PSPASES, one of the rst e cient, portable, and robust scalable parallel solvers for sparse symmetric positive de nite linear systems that we have developed. We discuss the algorithmic and implementation issues involved in its development; and present performance and scalability results on Cray T3E and SGI Origin 2000. PSPASES could solve the largest sparse system (1 million equations) ever solved by a direct method, with the highest performance (51 GFLOPS for Cholesky factorization) ever reported.
2000
One important mathematical problem in simulation of large electrical circuits is the solution of high-dimensional linear equation systems. The corresponding matrices are real, non-symmetric, very ill-conditioned, have an irregular sparsity pattern, and include a few dense rows and columns.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.