Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2004, The Journal of Supercomputing
In this paper, an exhaustive parallel library of sparse iterative methods and preconditioners in HPF and MPI was developed, and a model for predicting the performance of these codes is presented. This model can be used both by users and by library developers to optimize the efficiency of the codes, as well as to simplify their use. The information offered by this model combines theoretical features of the methods and preconditioners in addition to certain practical considerations and predictions about aspects of the performance of their execution in distributed memory multiprocessors.
Journal of Information Science and Engineering, 2002
The objective of this work is the analysis and prediction of the performance of irregular codes, mainly in their parallel implementations. In particular, this paper focuses on parallel iterative solvers for sparse matrices as a relevant case of study of this kind of codes. An efficient library of solvers and preconditioners was developed using HPF and MPI as parallel platforms. For this library, models to characterize and predict the behavior of the execution of the methods, preconditioners and kernels were introduced. To show the results of these models, a visualization tool with an easy to use GUI interface was implemented. Finally, results of the prediction models for the codes of the parallel library are presented using the visualization tool.
2002
In this paper, a tool for predicting and visualizing the performance of iterative methods is presented. These codes come from an exhaustive parallel library of sparse iterative methods and preconditioners in HPF and MPI, developed in a previous work. The tool can be used, both by users and by library developers, to optimize the efficiency of the codes, as well as to simplify their use. The information offered by this tool combines theoretical features of the methods and preconditioners, in addition to certain practical considerations and predictions about aspects of the performance of their execution in distributed memory multiprocessors.
This paper addresses the main issues raised during the parallelization of iterative and direct solvers for such systems in distributed memory multiprocessors. If no preconditioning is considered, iterative solvers are simple to parallelize, as the most time-consuming computational structures are matrix-- vector products. Direct methods are much harder to parallelize, as new nonzero values may appear during computation and pivoting operations are usually accomplished due to numerical stability considerations. Suitable data structures and distributions for sparse solvers are discussed within the framework of a data-parallel environment, and experimentally evaluated and compared with existing solutions. 1 Introduction
Future Generation Computer Systems, 2003
The selection of the best method and preconditioner for solving a sparse linear system is as determinant as the efficient parallelization of the selected method. We propose a tool for helping to solve both problems on distributed memory multiprocessors using iterative methods. Based on a previously developed library of HPF and message-passing interface (MPI) codes, a performance prediction is developed and a visualization tool (AVISPA) is proposed. The tool combines theoretical features of the methods and preconditioners with practical considerations and predictions about aspects of the execution performance (computational cost, communications overhead, etc.). It offers detailed information about all the topics that can be useful for selecting the most suitable method and preconditioner. Another capability is to offer information on different parallel implementations of the code (HPF and MPI) varying the number of available processors.
Future Generation Computer Systems, 2003
The selection of the best method and preconditioner for solving a sparse linear system is as determinant as the efficient parallelization of the selected method. We propose a tool for helping to solve both problems on distributed memory multiprocessors using iterative methods. Based on a previously developed library of HPF and message-passing interface (MPI) codes, a performance prediction is developed and a visualization tool (AVISPA) is proposed. The tool combines theoretical features of the methods and preconditioners with practical considerations and predictions about aspects of the execution performance (computational cost, communications overhead, etc.). It offers detailed information about all the topics that can be useful for selecting the most suitable method and preconditioner. Another capability is to offer information on different parallel implementations of the code (HPF and MPI) varying the number of available processors.
SIAM Journal on Scientific Computing, 2007
This paper addresses the parallelization of the preconditioned iterative methods that use explicit preconditioners such as approximate inverses. Parallelizing a full step of these methods requires the coefficient and preconditioner matrices to be well partitioned. We first show that different methods impose different partitioning requirements for the matrices. Then we develop hypergraph models to meet those requirements. In particular, we develop models that enable us to obtain partitionings on the coefficient and preconditioner matrices simultaneously. Experiments on a set of unsymmetric sparse matrices show that the proposed models yield effective partitioning results. A parallel implementation of the right preconditioned BiCGStab method on a PC cluster verifies that the theoretical gains obtained by the models hold in practice.
Computers & Mathematics with Applications, 2005
A performance model is constructed for parallel iterative numerical methods under the assumption of a message-passing computing system. Arguments are given for the fact that the speedup of parallel iterative methods is mainly influenced by the speedup at one iterative step. Using the theoretical model, it is proved why explicit iterative methods for ordinary differential equations are inefficient in implementation on distributed memory multiprocessor systems. Numerical tests on parallel and distributed computing environments confirm the correctness of the theoretical model at least in the case of iterative methods for ordinary differential equations and time*dependent partial differential equations. (~
Applied Mathematics and Computation, 2002
This paper is concerned with a new approach to preconditioning for large, sparse linear systems. A procedure for computing an incomplete factorization of the inverse of a nonsymmetric matrix is developed, and the resulting factorized sparse approximate inverse is used as an explicit preconditioner for conjugate gradient-type methods. Some theoretical properties of the preconditioner are discussed, and numerical experiments on test matrices from the Harwell-Boeing collection and from Tim Davis's collection are presented. Our results indicate that the new preconditioner is cheaper to construct than other approximate inverse preconditioners. Furthermore, the new technique insures convergence rates of the preconditioned iteration which are comparable with those obtained with standard implicit preconditioners.
IEEE Transactions on Parallel and Distributed Systems, 2015
In parallel linear iterative solvers, sparse matrix vector multiplication (SpMxV) incurs irregular point-to-point (P2P) communications, whereas inner product computations incur regular collective communications. These P2P communications cause an additional synchronization point with relatively high message latency costs due to small message sizes. In these solvers, each SpMxV is usually followed by an inner product computation that involves the output vector of SpMxV. Here, we exploit this property to propose a novel parallelization method that avoids the latency costs and synchronization overhead of P2P communications. Our method involves a computational and a communication rearrangement scheme. The computational rearrangement provides an alternative method for forming input vector of SpMxV and allows P2P and collective communications to be performed in a single phase. The communication rearrangement realizes this opportunity by embedding P2P communications into global collective communication operations. The proposed method grants a certain value on the maximum number of messages communicated regardless of the sparsity pattern of the matrix. The downside, however, is the increased message volume and the negligible redundant computation. We favor reducing the message latency costs at the expense of increasing message volume. Yet, we propose two iterative-improvementbased heuristics to alleviate the increase in the volume through one-to-one task-to-processor mapping. Our experiments on two supercomputers, Cray XE6 and IBM BlueGene/Q, up to 2,048 processors show that the proposed parallelization method exhibits superior scalable performance compared to the conventional parallelization method.
Parallel Computing, 2011
We investigate the efficient iterative solution of large-scale sparse linear systems on shared-memory multiprocessors. Our parallel approach is based on a multilevel ILU preconditioner which preserves the mathematical semantics of the sequential method in ILUPACK. We exploit the parallelism exposed by the task tree corresponding to the nested dissection hierarchy (task parallelism), employ dynamic scheduling of tasks to processors to
International Conference on High Performance Computing in Asia-Pacific Region Workshops, 2022
Preconditioned parallel solvers based on the Krylov iterative method are widely used in scientific and engineering applications. Communication overhead is a critical issue when executing these solvers on large-scale massively parallel supercomputers. In the previous work, we introduced communication-computation overlapping with dynamic loop scheduling of OpenMP to the sparse matrixvector multiplication (SpMV) process of a parallel iterative solver by Conjugate Gradient (CG) method in a parallel finite element application (GeoFEM/Cube) on multicore and manycore clusters. In the present work, first, we re-evaluated the method on our new system, Wisteria/BDEC-01 (Odyssey) (Fujitsu PRIMEHPC FX1000 with A64FX), and a significant performance improvement of 25-30% for parallel iterative solver at 2,048 nodes (98,304 cores) was obtained. Moreover, we proposed a new reordering method for communication-computation overlapping in ICCG solvers for a parallel finite volume application (Poisson3D/Dist), and attained 5-12% improvement at 1,024 nodes of Odyssey. CCS CONCEPTS • Computing methodologies → Parallel computing methodologies.
Applied Numerical Mathematics, 1995
P_SPARSLIB is a library of portable FORTRAN routines for parallel sparse matrix computations. The current thrust of the library is in iterative solution techniques. In this paper we present the “accelerators” part of the library, which comprise the best known of the Krylov subspace techniques. The iterative solution module is implemented in reverse communication mode in order to allow any preconditioner to be combined with the package. In addition, this mechanism allows us to ensure portability, since the communication calls required in the iterative solution process are hidden in the dot-product, the matrix-vector product and preconditioning operations.
Lecture Notes in Computer Science, 1996
A parallel implementation of a sparse approximate inverse (spai) preconditioner for distributed memory parallel processors (dmpp) is presented. The fundamental spai algorithm is known to be a useful tool for improving the convergence of iterative solvers for ill-conditioned linear systems. The parallel implementation (parspai) exploits the inherent parallelism in the spai algorithm and the data locality on the dmpps, to solve structurally symmetric (but non-symmetric) matrices, which typically arise when solving partial di erential equations (pdes). Some initial performance results are presented which suggest the usefulness of parspai for tackling such large size systems on present day dmpps in a reasonable time. The parspai preconditioner is implemented using the Message Passing Interface (mpi) and is embedded in the parallel library for unstructured mesh problems (plump).
2000
One important mathematical problem in simulation of large electrical circuits is the solution of high-dimensional linear equation systems. The corresponding matrices are real, non-symmetric, very ill-conditioned, have an irregular sparsity pattern, and include a few dense rows and columns.
ACM Transactions on Mathematical Software, 2001
This paper provides a comprehensive study and comparison of two state-of-the-art direct solvers for large sparse sets of linear equations on large-scale distributed-memory computers. One is a multifrontal solver called MUMPS, the other is a supernodal solver called SuperLU. We describe the main algorithmic features of the two solvers and compare their performance characteristics with respect to uniprocessor speed, interprocessor communication, and memory requirements. For both solvers, preorderings for numerical stability and sparsity play an important role in achieving high parallel efficiency. We analyse the results with various ordering algorithms. Our performance analysis is based on data obtained from runs on a 512-processor Cray T3E using a set of matrices from real applications. We also use regular 3D grid problems to study the scalability of the two solvers.
1995
A parallel solvers package of three solvers with a unified user interface is developed for solving a range of sparse symmetric complex linear systems arising from disc~tization of partial differential equations based on unstructu~d meshs using finite element, finite difference and finite volume analysis. Once the data interface is set up, the package constructs the sparse symmetric complex matrix, and solves the linear system by the method chosen by the user, either a preconditioned hi-conjugate gradient solver, or a two-stage Cholesky LDL7' factorization solver, or a hybrid solver combining the above two methods.A unique feature of the solvers package is that the user deals with local matrices on local meshes on each processor. Scaling problem size N with the number of processors P with N/P fixed, test runs on Intel Delta up to 128 processors show that the bkxmjugate gradient method scales linearly with N whereas the two-stage hybrid method scales with TN .
Lecture Notes in Computer Science, 2006
The purpose of our work is to provide a method which exploits the parallel blockwise algorithmic approach used in the framework of high performance sparse direct solvers in order to develop robust preconditioners based on a parallel incomplete factorization. The idea is then to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers.
Future Generation Computer Systems, 1998
Writing efficient iterative solvers for irregular sparse matrices in High Performance Fortran (HPF) is hard. The locality in the computations is unclear, and for efficiency we use storage schemes that obscure any structure in the matrix. Moreover, the limited capabilities of HPF to distribute and align data structures make it hard to implement the desired distributions, or to indicate these such that the compiler recognizes the efficient implementation. We propose techniques to handle these problems. We combine strategies that have become popular in message-passing parallel programming, like mesh partitioning and splitting the matrix in local submatrices, with the functionality of HPF and HPF compilers, like the implicit handling of communication and distribution. The implementation of these techniques in HPF is not trivial, and we describe in detail how we propose to solve the problems. Our results demonstrate that very efficient implementations are possible. We indicate how some of the 'approved extensions' of HPF-2.0 can be used, but they do not solve all problems. For comparison we show the results for regular sparse matrices.
Applied Numerical Mathematics, 1991
Heroux, M.A., P. Vu and C. Yang, A parallel preconditioned conjugate gradient package for solving sparse linear systems on a Cray Y-MP, Applied Numerical Mathematics 8 (1991) 93-115. In this paper we discuss current activities at Cray Research to develop general-purpose, production-quality software for the efficient solution of sparse linear systems. In particular, we discuss our development of a package of iterative methods that includes conjugate gradient and related methods (GMRES, ORTHOMIN and others) along with several preconditioners (incomplete Cholesky and LU factorization and polynomial). Vector and parallel performance issues are discussed as well as package design. Also, benchmarks on a wide variety of real-life problems are presented to assess the robustness and performance of methods implemented in our software. For symmetric positive-definite problems, we also compare the performance of the preconditioned conjugate gradient code with our parallel implementation of the multifrontal method for sparse Cholesky factorization.
Lecture Notes in Computer Science, 2002
This paper presents an overview of pARMS, a package for solving sparse linear systems on parallel platforms. Preconditioners constitute the most important ingredient in the solution of linear systems arising from realistic scientific and engineering applications. The most common parallel preconditioners used for sparse linear systems adapt domain decomposition concepts to the more general framework of "distributed sparse linear systems". The parallel Algebraic Recursive Multilevel Solver (pARMS) is a recently developed package which integrates together variants from both Schwarz procedures and Schur complementtype techniques. This paper discusses a few of the main ideas and design issues of the package. A few details on the implementation of pARMS are provided.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.