Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, ACM Communications in Computer Algebra
We demonstrate new routines for sparse multivariate polynomial multiplication and division over the integers that we have integrated into Maple 14 through the expand and divide commands. These routines are currently the fastest available, and the multiplication routine is parallelized with superlinear speedup. The performance of Maple is significantly improved. We describe our polynomial data structure and compare it with Maple's. Then we present benchmarks comparing Maple 14 with Maple 13, Magma, Mathematica, Singular, Pari, and Trip.
Computer Mathematics, 2014
We demonstrate how a new data structure for sparse distributed polynomials in the Maple kernel significantly accelerates several key Maple library routines. The POLY data structure and its associated kernel operations (degree, coeff, subs, has, diff, eval, ...) are programmed for high scalability with very low overhead. This enables polynomial to have tens of millions of terms, increases parallel speedup in existing routines and dramatically improves the performance of high level Maple library routines.
ACM Sigsam Bulletin, 2009
One of the main successes of the computer algebra community in the last 30 years has been the discovery of algorithms, called modular methods, that allow to keep the swell of the intermediate expressions under control. Without these methods, many applications of computer algebra would not be possible and the impact of computer algebra in scientific computing would be severely
We report on new code for sparse multivariate polynomial multiplication and division that we have recently integrated into Maple as part of our MITACS project at Simon Fraser University. Our goal was to try to beat Magma which is widely viewed in the computer algebra community as having state-of-the-art polynomial algebra. Here we give benchmarks comparing our implementation for multiplication and division with the Magma, Maple, Singular, Trip and Pari computer algebra systems. Our algorithms use a binary heap to multiply and divide using very little working memory. Details of our work may be found in [7] and [8].
ACM SIGSAM Bulletin, 2003
How should one design and implement a program for the multiplication of sparse polynomials? This is a simple question, necessarily addressed by the builders of any computer algebra system (CAS). To examine a few options we start with a single easily-stated computation which we believe represents a useful benchmark of "medium difficulty" for CAS designs. We describe a number of design options and their effects on performance. We also examine the performance of a variety of commercial and freely-distributed systems. Important considerations include the cost of high-precision (exact) integer arithmetic and the effective use of cache memory.
2009
We present a high performance algorithm for multiplying sparse distributed polynomials using a multicore processor. Each core uses a heap of pointers to multiply parts of the polynomials using its local cache. Intermediate results are written to buffers in shared cache and the cores take turns combining them to form the result. A cooperative approach is used to balance the load and improve scalability, and the extra cache from each core produces a superlinear speedup in practice. We present benchmarks comparing our parallel routine to a sequential version and to the routines of other computer algebra systems.
2010
We present a Las Vegas algorithm for interpolating a sparse multivariate polynomial over a finite field, represented with a black box. Our algorithm modifies the algorithm of Ben-Or and Tiwari in 1988 for interpolating polynomials over rings with characteristic zero to characteristic p by doing additional probes.
Journal of Symbolic Computation, 2011
In 1974, Johnson showed how to multiply and divide sparse polynomials using a binary heap. This paper introduces a new algorithm that uses a heap to divide with the same complexity as multiplication. It is a fraction-free method that also reduces the number of integer operations for divisions of polynomials with integer coefficients over the rationals. Heap-based algorithms use very little memory and do not generate garbage. They can run in the cpu cache and achieve high performance. We compare our C implementation of sparse polynomial multiplication and division with integer coefficients to the routines of existing computer algebra systems.
2007
A common way of implementing multivariate polynomial multiplication and division is to represent polynomials as linked lists of terms sorted in a term ordering and to use repeated merging. This results in poor performance on large sparse polynomials.
We present a new algorithm for pseudo-division of sparse multivariate polynomials with integer coefficients. It uses a heap of pointers to simultaneously merge the dividend and partial products, sorting the terms efficiently and delaying all coefficient arithmetic to produce good complexity. The algorithm uses very little memory and we expect it to run in the processor cache. We give benchmarks comparing our implementation to existing computer algebra systems.
ACM Communications in Computer Algebra, 2017
We employ two techniques to dramatically improve Maple's performance on the Fermat benchmarks for simplifying rational expressions. First, we factor expanded polynomials to ensure that gcds are identified and cancelled automatically. Second, we replace all expanded polynomials by new variables and normalize the result. To undo the substitutions, we use a C routine for sparse multivariate division by a set of polynomials. The resulting times for the first Fermat benchmark are a factor of 17x faster than Fermat and 39x faster than Magma.
2018
Our goal is to develop a high-performance code for factoring a multivariate polynomial in n variables with integer coefficients which is polynomial time in the sparse case and efficient in the dense case. Maple, Magma, Macsyma, Singular and Mathematica all implement Wang’s multivariate Hensel lifting, which, for sparse polynomials, can be exponential in n. Wang’s algorithm is also highly sequential. In this work we reorganize multivariate Hensel lifting to facilitate a highperformance parallel implementation. We identify multivariate polynomial evaluation and bivariate Hensel lifting as two core components. We have also developed a library of algorithms for polynomial arithmetic which allow us to assign each core an independent task with all the memory it needs in advance so that memory management is eliminated and all important operations operate on dense arrays of 64 bit integers. We have implemented our algorithm and library using Cilk C for the case of two monic factors. We disc...
Lecture Notes in Computer Science, 2004
Eden is a parallel functional language extending Haskell with processes. This paper describes the implementation of an interface between the Eden language and the Maple system. The aim of this effort is to parallelize Maple programs by using Eden as coordination language. The idea is to leave in Maple the computational intensive functions of the (sequential) algorithm and to use Eden skeletons to set up the parallel process topology in the available parallel machine. A Maple system is instantiated in each processor. Eden processes are responsible for invoking Maple functions with appropriate parameters and of getting back the results, as well as of performing all the data communication between processes. The interface provides the following services: instantiating and terminating a Maple system in each processor, performing data conversion between Maple and Haskell objects, invoking Maple functions from Eden, and ensuring mutual exclusion in the access to Maple from different concurrent threads in the local processor. A parallel version of Buchberger's algorithm to compute Gröbner bases is presented to illustrate the use of the interface.
SIAM Journal on Computing, 1983
It is shown that any multivariate polynomial of degree d that can be computed sequentially in C steps can be computed in parallel in O((log d)(log C + log d)) steps using only (Cd) 1) processors.
Lecture Notes in Computer Science, 2014
The Basic Polynomial Algebra Subprograms (BPAS) provides arithmetic operations (multiplication, division, root isolation, etc.) for univariate and multivariate polynomials over common types of coefficients (prime fields, complex rational numbers, rational functions, etc.). The code is mainly written in CilkPlus [10] targeting multicore processors. The current distribution focuses on dense polynomials and the sparse case is work in progress. A strong emphasis is put on adaptive algorithms as the library aims at supporting a wide variety of situations in terms of problem sizes and available computing resources.
Journal of Mathematical Sciences, 2010
We investigate multiplication algorithms for dense and sparse polynomials and polynomial matrices over different numerical domains and obtain expressions for the complexity of multiplication of polynomials and polynomial matrices understood as the expectation of the number of arithmetic operations. These expressions for a set of parameters of practical interest are tabulated. The results of experiments with the corresponding programs are presented. Bibliography: 8 titles.
We search for techniques to decrease the multiplication time for large sparse polynomials in Lisp by speeding up the sequential ac- cesses of large vectors. We do this by utilizing blocking to improve cache performance, which we show to be eective for suciently large problems.
2012
This paper aims to develop and analyze an effective parallel algorithm for multiplying polynomials and power series with integer coefficients. Such operations are of fundamental importance when generating parameters for public key cryptosystems, whereas their effective implementation translates directly into the speed of such algorithms in practical applications. The algorithm has been designed specifically to accelerate the process of generating modular polynomials, but due to its good numerical properties it may surely be used to solve other computational problems as well. The basic idea behind this new method was to adapt it to parallel computing. Nowadays, it is a very important property, as it allows to fully exploit the computing power offered by modern processors. The combination of the Chinese Remainder Theorem and the Fast Fourier Transform made it possible to develop a highly effective multiplication method. Under certain conditions, it is asymptotically faster than the al...
ArXiv, 2017
We set new speed records for multiplying long polynomials over finite fields of characteristic two. Our multiplication algorithm is based on an additive FFT (Fast Fourier Transform) by Lin, Chung, and Huang in 2014 comparing to previously best results based on multiplicative FFTs. Both methods have similar complexity for arithmetic operations on underlying finite field; however, our implementation shows that the additive FFT has less overhead. For further optimization, we employ a tower field construction because the multipliers in the additive FFT naturally fall into small subfields, which leads to speed-ups using table-lookup instructions in modern CPUs. Benchmarks show that our method saves about $40 \%$ computing time when multiplying polynomials of $2^{28}$ and $2^{29}$ bits comparing to previous multiplicative FFT implementations.
2019
Maple 2019 has a new multivariate polynomial factorization algorithm for factoring polynomials in \(\mathbb {Z}[x_1,x_2,...,x_n]\), that is, polynomials in n variables with integer coefficients. The new algorithm, which we call MTSHL, was developed by the authors at Simon Fraser University. The algorithm and its sub-algorithms have been published in a sequence of papers [3, 4, 5]. It was integrated into the Maple library in early 2018 by Baris Tuncer under a MITACS internship with Maplesoft. MTSHL is now the default factoring algorithm in Maple 2019.
Proceedings of the second international symposium on Parallel symbolic computation - PASCO '97, 1997
We ported the computer algebra system Maple V to the Intel Paragon, a massively parallel, distributed memory machine. In order to take advantage of the parallel architecture, we extended the Maple kernel with a set of message pawing primitives baaed on the Paragon's native message passing library. Using these primitives, we implemented a parallel version of Karatsuba multiplication for univariate polynomials over 2P Our speedup timings illustrate the practicability of our approach. On top of the message p=ing primitives we have implemented a higher level model of parallel processing baaed on the manager-worker scheme; a Maple application on one node of the parallel machine submits jobs to Maple processes residing on different nodes, then asynchronously collects the results. This model proves to be convenient for interactive usage of a distributed memory machine. Apart from the message passing parallelism we also use localized multi-threading to achieve symmetric multiprocessing within each node of the Paragon. We combine both approaches and apply them to the multiplication of large bivariate polynomials over small prime fields.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.