Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, ArXiv
This is a draft of a book about algorithms for performing arithmetic, and their implementation on modern computers. We are concerned with software more than hardware - we do not cover computer architecture or the design of computer hardware. Instead we focus on algorithms for efficiently performing arithmetic operations such as addition, multiplication and division, and their connections to topics such as modular arithmetic, greatest common divisors, the Fast Fourier Transform (FFT), and the computation of elementary and special functions. The algorithms that we present are mainly intended for arbitrary-precision arithmetic. They are not limited by the computer word size, only by the memory and time available for the computation. We consider both integer and real (floating-point) computations. The book is divided into four main chapters, plus an appendix. Our aim is to present the latest developments in a concise manner. At the same time, we provide a self-contained introduction for...
IEEE Transactions on Computers
C OMPUTER arithmetic is used in many applications, usually totally silently (one should keep in mind that even when running programs that are not at all numeric, memory addresses are computed, which involves additions, multiplications, and sometimes divisions). However, in some areas, it plays a central role. To give a few examples:
Journal of Computer Science, 2007
IEEE standard 754 floating point is the most common representation used for floating point numbers, and many computer arithmetic algorithms are developed for basic operations on this standard. In this study, new computer algorithms are proposed to increase the precision range and to solve some problems that are available while using these algorithms. However, these algorithms provide an optional range of required accuracy (Mega-Digit precision) to meet new computer's applications.
IEEE Transactions on Computers, 2000
Electronic Colloquium on Computational Complexity, 2008
We give anO(N logN 2O(log N) ) algorithm for multiplying twoN-bit integers that improves the O(N logN log logN) algorithm by Schonhage-Strassen (SS71). Both these algorithms use modular arithmetic. Recently, Furer (Fur07) gave an O(N logN 2O(log N) ) algorithm which however uses arithmetic over complex numbers as opposed to modular arithmetic. In this paper, we use multivariate polynomial multiplication
Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96, 1996
We propose in this paper a new algorithm and architecture for performing divisions in residue number systems. Our algorithm is suitable for residue number systems with large moduli, with the aim of manipulating very large integers on a parallel computer or a specialpurpose architecture. The two basic features of our algorithm are one one hand the use of a high-radix division method, and on the other hand the use of a floating-point arithmetic that should run in parallel with the modular arithmetic.
ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors, 2010
Since the introduction of the Fused Multiply and Add (FMA) in the IEEE-754-2008 standard [6] for floatingpoint arithmetic, division based on Newton-Raphson's iterations becomes a viable alternative to SRT-based divisions. The Newton-Raphson iterations were already used in some architecture prior to the revision of the IEEE-754 norm. For example, Itanium architecture already used this kind of iterations [8]. Unfortunately, the proofs of the correctness of binary algorithms do not extend to the case of decimal floating-point arithmetic. In this paper, we present general methods to prove the correct rounding of division algorithms using Newton-Raphson's iterations in software, for radix 2 and radix 10 floating-point arithmetic.
The present paper proposes a new parallel algorithm for the modular division $u/v\bmod \beta^s$, where $u,\; v,\; \beta$ and $s$ are positive integers $(\beta\ge 2)$. The algorithm combines the classical add-and-shift multiplication scheme with a new propagation carry technique. This "Pen and Paper Inverse" ({\em PPI}) algorithm, is better suited for systolic parallelization in a "least-significant digit first" pipelined manner. Although it is equivalent to Jebelean's modular division algorithm~\cite{jeb2} in terms of performance (time complexity, work, efficiency), the linear parallelization of the {\em PPI} algorithm improves on the latter when the input size is large. The parallelized versions of the {\em PPI} algorithm leads to various applications, such as the exact division and the digit modulus operation (dmod) of two long integers. It is also applied to the determination of the periods of rational numbers as well as their $p$-adic expansion in any rad...
Lecture Notes in Computer Science
Fast arithmetic for characteristic three finite fields F3 m is desirable in pairing-based cryptography because there is a suitable family of elliptic curves over F3 m having embedding degree 6. In this paper we present some structure results for Gaussian normal bases of F3 m , and use the results to devise faster multiplication algorithms. We carefully compare multiplication in F3 m using polynomial bases and Gaussian normal bases. Finally, we compare the speed of encryption and decryption for the Boneh-Franklin and Sakai-Kasahara identity-based encryption schemes at the 128-bit security level, in the case where supersingular elliptic curves with embedding degrees 2, 4 and 6 are employed.
Journal of Mathematical Cryptology, 2009
In this work we reexamine a modular multiplication and a modular exponentiation method. The multiplication method, proposed by Hayashi in 1998, uses knowledge of the factorization of both N + 1 and N + 2 to compute a multiplication modulo N. If both N + 1 and N + 2 can be factored into k equally sized relatively prime factors then the computations are done modulo each of the factors and then combined using the Chinese Remainder Theorem. It was suggested that the (asymptotic) computational costs of the method is 1/k of simply multiplying and reducing modulo N. We show, however, that the computational costs of the method is (asymptotically) at least as costly as simply multiplying and reducing modulo N for both squarings and general multiplications when efficient arithmetic is used. The exponentiation method, proposed by Hwang, Su, Yeh and Chen in 2005, is based on Hayashi's method and uses knowledge of the factorization of P + 1 and P − 1 to compute an exponentiation modulo an odd prime P. We begin by showing that the method cannot be used as a general purpose exponentiation method and then modify the method so that it can work as a general purpose modular multiplication method. Like Hayashi's method, however, this method is at best (asymptotically) only as efficient as simply multiplying and reducing modulo P .
International Journal of Software & Hardware Research in Engineering, 8(2), 27–35., 2020
The article contains a prospectus of the book under the same title [1]. This book is published only in Russian and in this connection, this prospectus is published. The book contains 673 pages. The author seeks assistance in publishing a book in English.
2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH), 2016
Some important computational problems must use a floating-point (FP) precision several times higher than the hardwareimplemented available one. These computations critically rely on software libraries for high-precision FP arithmetic. The representation of a high-precision data type crucially influences the corresponding arithmetic algorithms. Recent work showed that algorithms for FP expansions, that is, a representation based on unevaluated sum of standard FP types, benefit from various high-performance support for native FP, such as low latency, high throughput, vectorization, threading, etc. Bailey's QD library and its corresponding Graphics Processing Unit (GPU) version, GQD, are such examples. Despite using native FP arithmetic as the key operations, QD and GQD algorithms are focused on double-double or quad-double representations and do not generalize efficiently or naturally to a flexible number of components in the FP expansion. In this paper, we introduce a new multiplication algorithm for FP expansion with flexible precision, up to the order of tens of FP elements in mind. The main feature consists in the partial products being accumulated in a special designed data structure that has the regularity of a fixed-point representation while allowing the computation to be naturally carried out using native FP types. This allows us to easily avoid unnecessary computation and to present rigorous accuracy analysis transparently. The algorithm, its correctness and accuracy proofs and some performance comparisons with existing libraries are all contributions of this paper.
Euclidean division algorithm (EDA) forms a core procedure in elementary number theory with numerous applications mainly to: (i) determine gcd of two integers, and (ii) find multiplicative inverse of two integers to solve linear Diophantine equations, CRT or other needs of modular arithmetic including its central use in RSA cryptosystems. The present work demonstrates a remarkable application of the technique for multiplying any two integers. The technique is applied directly when the multiplying integers are dissimilar, and is used with some modification when the multipliers are identical where EDA is otherwise impractical to use. The finding is lent support by the known property of Fibonacci-like numbers wherein the product of any two consecutive Fibonacci or Lucas numbers is defined in terms of sum of squares of lesser terms. A non-recursive python implementation algorithm of the procedure has revealed that the same can handle multiplication of moderately huge integers having decimal digit sizes (DDS) from tens of thousands through to a million digits. The procedure is inherently suited for multiplying unequal sized integers, that feature is missing in currently employed multiplication protocols. The finding is expected to have both academic and practical implications in elementary number theory.
1981 IEEE 5th Symposium on Computer Arithmetic (ARITH), 1981
For effective application of on-line arithmetic to practical numerical problems, floating-point algorithms for on-line addition/subtraction and multiplication have been implemented by introducing the notion of quasi-normalization. Those proposed are normalized fixed-precision FLPOL (floating-point on-line) algorithms.
Very Fast Integer Divide,,Greatest Common Divisor(GCD), Ineteger Multiply with Pseudo Code, 2022
In this paper we discuss the binary implementations of Integer Divide, Greatest Common Divisor(GCD), and Integer Multiply. This fast binary Integer Division, Very Fast Binary GCD and parallel O(1) addition and subtraction can be implemented in an Arithmetic Logic Unit or ALU of CPUs to form modern CPUs with very fast Multiprecision Arithmetic. If the four operations such as addition,subtraction multiplication and division is performed in hardware in an ALU of a CPU then this accelerates all the current software available for that Instruction Set Architecture transparently without modification making a win win situation possible. As well these four arithmetic operations are those of Galois Fields speeding up Arithmetic and Cryptography. We could also temporarily implement an ALU on an FPGA, with suitable high speed clocks, add on PCI express card and that ALU would be faster than the native CPU's ALU perhaps. The Add-On hardware accelerator could talk to the CPU directly through a PCI Express bus with a protocol that transfers bytes of MultiPrecision Arithmetic Data between them. It could have an initial byte to identify the arithmetic operation and them four bytes or more to signify the length of the data perhaps similar to the Microsoft Wave format for digitized sound streams. The ALU's of Digitial Signal Processors could also be replaced to make for very fast arithmetic. These CPU's with Accelerated ALU's could make real time Artificial Intelligence training and pattern recognition possible in real time. That is real time AI. Usually Binary Integer division is done by successively subtracting the divisor from the dividend (in the schoolbook method) until what remains is less than the divisor which is the remainder and the quotient is the sum of all times we subtracted from the dividend to get the remainder which is less than the divisor. A very fast method to perform integer division would be if both the dividend and the divisor are assumed to be positive integers for simplicity. If we subtract the maximum allowable amount from the dividend n each time, by shifting the divisor d by m bits to the left, when the left most, most significant bits are aligned of n and the left shifted d ,(multiply by 2 m the d and subtract) where m is the bits of left shift, and is the string length in the binary representation of n is equal to k and the string length in binary of d in bits, say, p to form our m = k − p to derive our maximum possible left shift of d to give d = d * 2 m. But now there may be a problem with the the dividend n being smaller than d. If so we must right shift d one bit or d = d 2 such that n > d. Now form the next n = n − d. The quotient q is the running sum of the left shifts of d as the powers of 2. So q = 2 m i over i such that, through successive left shifts of d i and subtractions of d i from n i we arrive at a remainder r = n i where n i < d and stop and output the q = 2 m i and r = n i .
Lecture Notes in Computer Science
Informatique théorique et applications, tome 23, n o 1 (1989), p. 101-111. <http © AFCET, 1989, tous droits réservés. L'accès aux archives de la revue « Informatique théorique et applications » implique l'accord avec les conditions générales d'utilisation (). Toute utilisation commerciale ou impression systématique est constitutive d'une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright. Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques Informatique théorique et Applications/Theoretical Informaties and Applications (vol.23, n° 1, 1989, p. 101 à 111) ON COMPUTATIOIMS WITH INTEGER DIVISION by Bettina JUST (*), Friedhelm MEYER AUF DER HEIDE ( 2 )
Springer eBooks, 2018
C HAPTER has shown that operations on floating-point numbers are naturally expressed in terms of integer or fixed-point operations on the significand and the exponent. For instance, to obtain the product of two floating-point numbers, one basically multiplies the significands and adds the exponents. However, obtaining the correct rounding of the result may require considerable design effort and the use of nonarithmetic primitives such as leading-zero counters and shifters. This chapter details the implementation of these algorithms in hardware, using digital logic. Describing in full detail all the possible hardware implementations of the needed integer arithmetic primitives is much beyond the scope of this book. The interested reader will find this information in the textbooks on the subject [345, 483, 187]. After an introduction to the context of hardware floating-point implementation in Section 8.1, we just review these primitives in Section 8.2, discuss their cost in terms of area and delay, and then focus on wiring them together in the rest of the chapter. We assume in this chapter that inputs and outputs are encoded according to the IEEE 754-2008 Standard for Floating-Point Arithmetic.
IFIP Advances in Information and Communication Technology, 1995
Taking into account the various possibilities offered by computer arithmetic during highlevel synthesis may help to design much powerful! architectures. The arithmetic operators (mainly in floating point arithmetic) may be carefully tuned up to exactly fit the time and accuracy requirements. This, of course, may have an influence on the synthesis process. In this paper, we mainly address two topics: first of all the problem of numerical error control; and then the possible use of redundant number systemswhich allow some nice features such as carry free addition or digit-serial, most significant digit first arithmetic operations. We then propose some potential applications and we discuss the new synthesis problems they involve.
Information Processing Letters, 2009
In this paper, some issues concerning the Chinese remaindering representation are discussed. Some new converting methods, including an efficient probabilistic algorithm based on a recent result of von zur Gathen and Shparlinski [5], are described. An efficient refinement of the NC 1 division algorithm of Chiu, Davida and Litow [2] is given, where the number of moduli is reduced by a factor of log n.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.