0% found this document useful (0 votes)

198 views15 pages

Polynomial Multiplication in Small Fields

This document discusses algorithms for Toom-Cook multiplication of polynomials over finite fields with small characteristic (Fp[x] where p is 3, 5, or 7). It describes how Toom's strategies can be adapted for these fields by rewriting polynomials in terms of another variable Y with a small degree. The document provides details on a Toom-3 algorithm for Fp[x], including the exact sequence of operations. It also discusses searching for optimal algorithms by considering the costs of basic operations like addition, subtraction, and multiplication over these fields.

Uploaded by

Juan Pastor Ramos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views15 pages

Polynomial Multiplication in Small Fields

Uploaded by

Juan Pastor Ramos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Notes on Low Degree Toom-Cook Multiplication with

Small Characteristic

Marco Bodrato

Centro Interdipartimentle “Vito Volterra” – Università di Roma Tor Vergata

Via Columbia 2 – 00133 Rome, Italy

Abstract. The use of Toom-Cook sub-quadratic polynomial multiplication was recently

shown to be possible also when the coefficient field does not have elements enough, partic-
ularly for F2 [x]. This paper focus on how Toom’s strategies can be adapted to polynomials
on non-binary small fields. In particular we describe the Toom-3 algorithm for Fp [x] with
p ∈ {3, 5, 7}, and some other unbalanced algorithms.
Algorithms are described with full details, not only the asymptotic complexity is given, but
the exact sequence of operations for possibly optimal implementations.
Algorithms given here for Fp [x] perfectly works, as well, for Fpn [x], and may be useful for
computations inside Fpn for large enough n.
Moreover some connections with FFT-based multiplication are found, slightly improving
the worst cases of FFT.

Keywords: Polynomial multiplication, finite fields, Toom-Cook, Karatsuba, optimal, con-

volution, squaring.

1 Introduction
In a previous paper [Bod07] the author faced the problem of writing fast multiplication algorithms,
using Toom-Cook splitting strategies [Too63a,Coo66], for polynomials on a field with very few
elements: F2 [x]. A task that some authors believe impossible, because “there are not enough
evaluation points”.
The main idea comes from the fact that Toom-Cook algorithms are efficient only for quite big
polynomials and should be used recursively. That’s why we rewrite the operands as polynomials
in Fp [x] [Y ] with a small degree in Y . After this rewriting we can choose elements to evaluate at;
not only in the field Fp , but in the bigger ring Fp [x].
In this work we will focus on fields with characteristic 3, 5 and 7, because binary fields already
got enough attention and for bigger primes we can usually adapt algorithms optimised for Z [x].

2 Toom-Cook Algorithm for Polynomials

We start recalling Toom’s algorithm, as generalised in [Bod07]. Starting from two polynomials
u, v ∈ R[x], on some integral domain R, we want to compute the product R[x] 3 w = u · v. The
whole algorithm can be described in five steps.
Splitting : Choose
Pn−1 some base Y = xb ,P and represent u and v by means of two polynomials
n−1−i i m−1
u(y, z) = i=0 ui z y , v(y, z) = i=0 vi z m−1−i y i , both homogeneous, with respectively
n and m coefficients and degrees deg(u) = n − 1, deg(v) = m − 1. Such that u(xb , 1) =
u, v(xb , 1) = v. The coefficients ui , vi ∈ R[x] are themselves polynomials and can be chosen
to have degrees ∀ i, deg(ui ) < b, deg(vi ) < b.
2 Marco Bodrato (on-line version http: // bodrato. it/ papers/ #CIVV2007 )

Traditionally the Toom-n algorithm requires balanced operands so that m = n, but we

can easily generalise to unbalanced ones. We assume commutativity, hence we also assume
n ≥ m > 1.
Evaluation : We want to compute w = u · v whose degree is d = n + m − 2, so we need
d + 1 = n + m − 1 evaluation points Pd = {(α0 , β0 ), . . . , (αd , βd )} where αi , βi ∈ R[x] can be
polynomials. We define c = maxi {deg(αi ), deg(βi )}.
The evaluation of a single polynomial (for example u) on the points (αi , βi ), can be computed
with a matrix by vector multiplication. The matrix Ed,n is a (d + 1) × n Vandermonde-like
matrix. u(α, β) = Ed,n u =⇒
  n−1
α0 · β0n−2 · · · α0n−2 · β0 α1n−1
  
u(α0 , β0 ) β0 u0
u(α1 , β1 ) β1n−1 α1 · β1n−2 · · · α1n−2 · β1 α2n−1   u1 
 =  .. (1)
    
 .. .. .. ..   .. 
 .   . . . .   . 
n−1 n−2 n−2 n−1
u(αd , βd ) βd αd · βd · · · αd · βd αd un−1

Recursive multiplication : We compute ∀ i, w(αi , βi ) = u(αi , βi ) · v(αi , βi ), with d + 1 mul-

tiplications of polynomials whose degrees are comparable to that of Y = xb . We have
deg(u(αi , βi )) ≤ c(n − 1) + b, deg(v(αi , βi )) ≤ c(m − 1) + b, and the results deg(w(αi , βi )) ≤
c(n + m − 2) + 2b = cd + 2b. We note that c, d, m, n are fixed numbers for a chosen implemen-
tation, b instead will grow as the operands grow.
Interpolation : This step depends only on the expected degree of the result d, and on the d+1
chosen points (αi , βi ), no more on n and m separately. We now need the coefficients of the
Pd
polynomial w(y, z) = i=0 wi z d−i y i . We know the values of w evaluated at d+1 points, so
we face a classical interpolation problem. We need to apply the inverse of Ad , a (d+1) × (d+1)
Vandermonde-like matrix. w(α, β) = Ad w =⇒
−1 
β0 α0 · β0d−1 · · · α0d−1 · β0 α0d
   d 
w0 w(α0 , β0 )
w1  β1d α1 · β d−1 · · · αd−1 · β1 α1d  w(α1 , β1 )
1 1
 ..  =  . (2)
     
.. .. ..   ..
 .   ..

. . .   . 
wd d d−1 d−1 d w(αd , βd )
βd αd · βd · · · αk · βd αd

Recomposition : The desired result can be simply computed with one more evaluation: w =
w(xb , 1). This step requires at most d shifts and sums.

The two critical phases are evaluation and interpolation. As stated by formulas (1) and (2),
both require a matrix by vector multiplication. This two phases can require many sums and
subtractions, shifts, and even small multiplications or exact divisions (interpolation only) by
small elements in R[x]. In this paper we will give explicit Evaluation Sequences of operations
(called ES from now on) as well as Interpolation Sequences (IS) for described Toom algorithms
paying much attention in giving sequences with the lowest possible computational cost.

3 Searching for Optimality

The two previous papers [BZ07,Bod07] described techniques for computer aided search of optimal
Toom-n algorithms. Those techniques are valid for Z, Z [x] and F2 [x], with very small adaptations.
Here we will not explain them from start, but only highlight differences to take care of for different
polynomial rings.
Notes on low degree Toom-Cook multiplication with small characteristic 3

In both papers we basically assume we have the four operations: addition and subtraction,
together with multiplication and exact division both by small fixed1 constants. We also silently
assume that the costs of this operations are linear with respect to operand length; this assumption
is valid for any Fpn [x]. For readers not used with exact division, detailed explanation can be
found in §A. Moreover we assumed some relations between costs of the various operations or
some combinations, and this relations can be very different for different characteristics.

3.1 Linear Combinations

The basic operation we really need is linear combination li ← ±cj · lj /dj ± ck · lk /dk . We assume
it is possible, without temporary variables, for any cj , dj , ck , dk ∈ R[x] small constants, even if
i = j or i = k. The cost of various types of linear combinations are carefully classified in [BZ07],
but here we will simply compute them by adding up the cost of single operations, converted to
bit-shift whenever possible or skipped when the coefficient is trivial.

3.2 Cost of Basic Operations

We keep on assuming that addition and subtraction do cost the same, and we will call this cost
add.
We assumed exact division by a small constant to cost always the same regardless of the
constant, but this is not true any more, because if the constant is an invertible element δ ∈ Fpn of
the field, than the division reduce to a multiplication times the inverse δ −1 ∈ Fpn , which should
cost far less than the exact division by a generic element of Fpn [x]. Vice-versa in Z [x] we could
count on a cost-less multiplication (or division) by the invertible element −1: now we should no
more count on a simple sign-changing. Depending on representation, multiplication by −1 can
have very different cost. That’s why we admit operations like l1 ← l0 − l1 whose result equals
−(l1 − l0 ).

Doable shifts We also assumed some particular multiplications and divisions could be replaced
by fast shifting, by powers of 2 in Z [x] and by powers of x in F2 [x]. Powers of x should be simple
multipliers or divisors for any polynomial ring, the case being very different for powers of 2.
Multiplication by 2 can not cost more than a sum, because we can compute a ← 2 · b by
a ← b + b, and this can be even faster than a generic sum, because we have to read only one
operand. Other powers of 2, instead, can not be easily optimised, because of the needed reduction.
Division by 2 can be obtained with a multiplication by 2−1 ∈ Fp , but if elements in Fp are
represented in memory as integers in the range [0..p − 1] we can also divide even numbers with a
simple shift, and odd numbers with a shift followed by a correction: addition of 2−1 ∈ Fp . This
−1
procedure strategy gives the correct result because a2 ≡ a−1 1
2 + 2 ≡ (a − 1) 1 + 2 (mod p).
This trick also works with Montgomery’s representation [Mon85], with only one difference: the 1
in the above formula is not the neutral element for multiplication, but the residue represented by
all zeroes but the least significant bit.

4 The Matrices
Two kind of matrices are involved in any Toom-k algorithm, the square invertible matrix Ad and
the two, possibly equal, matrices Ed,n , Ed,m with the same number d + 1 = 2k − 1 of rows, but
fewer (respectively. n ≤ d and m ≤ d) columns.
1
Where small means that we can store those constants in a CPU word, but is not as an important
attribute as “fixed”, meaning asymptotically small; thus operations are asymptotically linear.
4 Marco Bodrato (on-line version http: // bodrato. it/ papers/ #CIVV2007 )

4.1 Matrices for the Interpolation Sequence

Since the matrices from equation 2 must be invertible, we are interested in the determinant.
Which can be computed from the points in Pd .
Theorem 1. For the Vandermonde-like matrix Ad generated from the d + 1 points in Pd =
{(α0 , β0 ), . . . , (αd , βd )}, the determinant can be computed with:
Y
det(Ad ) = (αi βj − αj βi )
0≤i<j≤d

Proof. It can be easily seen that a matrix with two points with βi = βj = 0 is not invertible, and
the above formula correctly gives 0.
If one point, suppose (α0 , β0 ), has β0 = 0, the matrix will start with the line (0, . . . , 0, α0d ).
Computing the determinant starting from this row, we will have αid det(A ed ) where A ed is the
complementary minor. Ad is a Vandermonde d × d matrix for the points αi /βi , where the i-th
e
line was multiplied by βid .
Using the classical formula for Vandermonde matrices, we obtain:
Y Y Y
det(Ad ) = α0d det(Aed ) = α0d βid (αi /βi − αj /βj ) = (αi βj − αj βi )
0<i≤d 0<i<j≤d 0≤i<j≤d

4.2 The Choice of Evaluation Points

The choice of the evaluation points Pd is one of the most important steps to reach an optimal
implementation, and completely determines the matrices Ad , Ed,2 , . . . , Ed,d .
We will consider two of them as being automatically chosen (0, 1), (1, 0), representing re-
spectively 0 and ∞, and immediately giving w0 = u0 · v0 , wd = un−1 · vm−1 , and the rows
(1, 0, . . .), (. . . , 0, 1). Another good choice is the point (1, 1), and (if characteristic 6= 2) (−1, 1).
We need an invertible matrix Ad , so if we use any point (αi , βi ), no other multiple point (λαi , λβi )
can be added, otherwise the factor (αi λβi − λαi βi ) will nullify the determinant.
Since the dimension of the extra space needed for the carries depends on the parameter
c = maxi {deg(αi ), deg(βi )}, we try to keep it as small as possible. In F2 [x] the use of degree-one
polynomials allows 9 possible couples, so that algorithms up to Toom-5 can be analysed. Even
for higher characteristic we will restrict to degree one, because we are not seeking higher Toom
instances anyway.
In Z and in F2 we could prove that any Toom-k with k > 2 requires at least one division,
because of the determinant value. Working in Fpn [x] we have many invertible elements, and the
determinant of the matrices Ad , even for higher Toom, can be 1 or the inverse of a power of two.
On the other hand, when we use any non-constant polynomial in Fpn [x]/Fpn , theorem 2 in
[Bod07, p.120] can be used to prove the need of a polynomial division.

4.3 Matrices for the Evaluation Sequence

The matrices Ed,n are non-square, so we can not compute the determinant. But we can compute
the rank.
Theorem 2. If the points Pd give an invertible Ad , then the rank of any Ed,n matrix is n.

Proof. Since the Ed,n are sub-matrices of the matrix Ad , modulo some multiplication of rows by
non-zero constants, all the n columns are linearly independent, so that the rank is n.
Notes on low degree Toom-Cook multiplication with small characteristic 5

4.4 Unusual Evaluations

Dealing with general finite fields one could consider some more general evaluation/interpolation
matrices.
In Z [x] one can consider basically two possible evaluations of a polynomial p(x) of degree
d: p(a), a ∈ Z or bd p( ab ). Thus if we have p(x) = p0 + p1 x + p2 x2 we usually evaluate p(a) =
p0 + p1 a + p2 a2 or b2 p(1/b) = b2 p0 + p1 b + p2 .
But in Fpn [x] one can also consider a general βp(α), with the quadratic example above we can
obtain also, for example, ap(a−1 ) = ap0 + p1 + p2 a−1 ; giving a line [a 1 a−1 ] in the matrix Ed,3
and a fairly easy evaluation sequence.
For squaring purposes we would better evaluate “both” operands with the same matrix, but
for general products we can even mix evaluations, for example:

v(α, β)u(β −1 , α−1 ) = (αβ)−n+1 w(α, β)

or more generally:

∀(αv , βv , αu , βu , γ, δ) ∈ R6 αv βu = αu βv ⇒ ∃λ : γv(αv , βv ) · δu(αu , βu ) = λw(αv , βv )

This new kind of mixed evaluation does not affect theorem 2, but will change the determinant.
Anyway the author did not find any optimal sequence with this kind of asymmetrical evaluations,
further investigation can be done in this direction.
From now on we will consider only plain evaluations, so that the algorithms will be easily
adapted from product to squaring.

5 Results and Algorithms in Characteristic 3

The algorithms described in the following sections were studied to work in F3 [x], but can be
applied in general for characteristic 3. A recent work [AHM07] suggest that arithmetic in F3m
works very fast with polynomial basis representation: this fact opens the way to experimentation
for multiplication of elements in those fields with our algorithms. We skip Toom-2 because it
coincides with the well known Karatsuba [KO62].
To verify algorithms proposed in this section with GP/PARI, some lines defining coefficients
in F3 should be pre-inserted:
U0=u0*Mod(1,3) ; U1=u1*Mod(1,3) ; U2=u2*Mod(1,3) ; U3=u3*Mod(1,3)
V0=v0*Mod(1,3) ; V1=v1*Mod(1,3) ; V2=v2*Mod(1,3)

5.1 Choosing Evaluation/Interpolation Points

Possible values for α, β up to degree 1 are:{0, ±1, ±x, ±x + 1, ±x − 1}. Contrary to what happen
in characteristic 2, the square of a binomial is a trinomial. Even simple evaluation using x ± 1
will require lots of computations; therefor, when possible, it is better to use ±x

5.2 Toom-2.5 in F3 [x]

The Toom-2.5 algorithm can be used to multiply two operands whose size is not the same. In
particular, one will be divided in 3 parts, the other in 2 parts.
Toom-2.5 requires four evaluations, so we need four “values”. With F2 [x] we could not just
use elements of the field F2 , so we added also the variable x. Here we use the three values we
have in F3 and ∞, named in our homogenized notation {(1, 0), (1, 1), (−1, 1), (0, 1)}. This choice
6 Marco Bodrato (on-line version http: // bodrato. it/ papers/ #CIVV2007 )

is possible for all greater characteristic, and the algorithm described in [BZ07] for Z [x] can be
used. The unique small trick possible in F3 is to avoid the division. Playing with signs we can
have just one division by −2 ≡ 1 (mod 3); the code is shown in figure 1.
Matrices are the same for any characteristic 6= 2. Swapping lines we can obtain det(A3 ) = ±2,
as needed.

U = U2X^2 + U1XY + U0Y^2

V = V1*X + V0* Y 0 1 0 1
1 0 0 1 0
\\ Evaluation: 5 add; 4 mul (n) B1 1 1C B 1 1C
W3 = U0 + U2 ; W0 = V0 + V1 E3,3 =B
@ 1 −1 1 A ; E3,2 = @−1
C B C
1A
W2 = W3 + U1
W1 = W2 * W0 0 0 1 0 1
W3 = W3 - U1 ; W0 = V1 - V0
W2 = W3 * W0
W3 = U2 * V1 ; W0 = U0 * V0
\\ Interpolation: 4 add (2n) 0
1 0 0 0
1
W2 -= W1 B 1 1 1 1C
W1 -= W2 + W3 A3 = B
@−1
C
1 −1 1 A
W2 -= W0 0 0 0 1
\\ Recomposition
W = W3*X^3+ W2*X^2*Y+ W1*X*Y^2 + W0*Y^3
W == U*V
Fig. 1. Toom-2.5 in F3

5.3 Toom-3 in F3 [x]

Toom-3 is by far the best known and widely used variant of Toom-Cook algorithms. But usually
only in characteristic 0, for the multiplication in Z or Z [x]. While writing this paper, no imple-
mentations of balanced Toom-3 in F3 [x] by other authors was found on the net, and no detailed
description in the literature. Here we propose one.
Toom-3 has two variants, the balanced one, which is the most interesting, because it can be
used recursively; and the unbalanced, good when one operand is about twice as big as the other.
The two variants share the same IS but have different evaluation matrices. The balanced version
uses twice E4,3 , while the unbalanced one uses E4,2 for the smallest operand and E4,4 for the
bigger one.
The set of points used is P4 = {(0, 1), (1, −1), (1, 1), (1, x), (1, 0)}; code is shown in figure 2.
The IS needs one exact division, by the small constant element x3 − x. This division can
be obtained by a shift and a division by x2 − 1, eventually changing the two central lines of
interpolation with:

W3 = (W0 - W3)/x + W1 + W2x + W4x^3

W3 /= (1-x^2)

with a new sequence requiring the same number of operations.

Moreover the point (1, x) can be changed to some different (1, xw ), with a suitable w, may be
the number of elements of the field stored in each CPU word, so that shift should cost far less.
The drawback is that operands get bigger after evaluation, and recursive multiplication has to
manage longer values. In that case division will be by x3w − xw = xw (x2w − 1), and the algorithm
in §A.3 can be useful.
When working on some F3n , or other extension, an element a ∈ F3n can be used instead of x.
In this case, (a3 − a) needs to be invertible.
Notes on low degree Toom-Cook multiplication with small characteristic 7
0 1 0 1 0 1 0 1
1 0 0 1 0 0 0 0 1 0 0 0 1 0
B1
B −1 1C C
B1
B −1 1 −1 1C C
B1
B −1 1 −1CC
B1
B −1CC
E4,3 B1
=B 1 1C C ; A4 = B1 1 1 1 1C E4,4 B1
=B 1 1 1CC ; E4,2 = B1 1C
B B
C C
@1 x x2 A @1 x x2 x3 x4 A @1 x x2 x3 A @1 xA
0 0 1 0 0 0 0 1 0 0 0 1 0 1
U = U2*Y^2 + U1*Y + U0 U = U3*Y^3 + U2*Y^2 + U1*Y + U0
V = V2*Y^2 + V1*Y + V0 V = V1*Y + V0

\\ Evaluation:10 add,4 shift;5 mul (n) \\ Evaluation:10 add,4 shift;5 mul

W0 = U0 + U2 ; W2 = V0 + V2 W0 = U0 + U2
W3 = W0 - U1 ; W4 = W2 + V1 W1 = U1 + U3
W0 = W0 + U1 ; W2 = W2 - V1 W3 = W0 - W1 ; W4 = V0 + V1
W0 = W0 + W1 ; W2 = V0 - V1
W1 = W3 * W2 ; W2 = W0 * W4 W1 = W3 * W2 ; W2 = W0 * W4

W0 = U0 +(U1 + U2x)x W0 = U0 +(U1 +(U2 +U3x)x)*x

W4 = V0 +(V1 + V2*x)*x W4 = V0 + V1*x
W3 = W0 * W4 W3 = W0 * W4
W0 = U0 * V0 ; W4 = U2 * V2 W0 = U0 * V0 ; W4 = U3 * V1

\\ Interpolation: 9 add, 3 shift, 1 Sdiv

W1 -= W2
W2 -= W1 + W0 + W4
W3 -= W0 + W1*x + W2*x^2 + W4*x^4
W3 /=(x^3-x)
W1 -= W3
\\ Recomposition
W = W4*Y^4+ W3*Y^3+ W2*Y^2+ W1*Y + W0
W == U*V
Fig. 2. Toom-3 in F3 [x]

6 Results and Algorithms in Characteristic 5

The algorithms described in the following sections were studied to work in F5 [x], but can be
applied in general for characteristic 5. Again we skip Toom-2; we could skip Toom-2.5 too, because
it works as in Z [x], but a small consideration is given here to introduce Toom-3.5, which is
described in full details. The classical and most important Toom-3 is then given. The fact that
for all the elements x ∈ F5 we have x4 ≡ 1, gives many cancellation and the shortest sequences
both for the 3-way and for the unbalanced Toom-3.5 .
To verify algorithms proposed in this section with GP/PARI, some lines defining coefficients
in F5 should be pre-inserted:

U0=u0Mod(1,5); U1=u1Mod(1,5); U2=u2Mod(1,5); U3=u3Mod(1,5); U4=u4*Mod(1,5)

V0=v0*Mod(1,5); V1=v1*Mod(1,5); V2=v2*Mod(1,5)

6.1 Toom-2.5 in F5 [x], FFT-like

The Toom-2.5 algorithm has been completely eviscerated in §5.2, in F5 [x] the algorithm described
in [BZ07] can be used. The only possible trick, is to play with signs obtaining det(A3 ) = −2, so
that the division is replaced with a product by 2 = −2−1 .
8 Marco Bodrato (on-line version http: // bodrato. it/ papers/ #CIVV2007 )

With different sets of evaluation points, we can even obtain det(A e3 ) = 1 and sequences with
neither multiplications nor divisions of any kind, but they require a greater number of operation,
so we discarded them; after all, a product by two is nothing but one more addition.

6.2 Toom-3 in F5 [x]

With the same choice of points as in Z, the determinant can be 2 or −2 = 2−1 , so that only one
product (or division) by two is needed.
Again we will give the two variants for Toom-3: the balanced and the unbalanced one. In the
code shown for the balanced product, we give two different ES, one for each operands: they are
equivalent with respect to operation count. The preferred one can be chosen during implementa-
tion.
The set of points used is P4 = {(0, 1), (1, −1), (1, 1), (1, 2), (1, 0)}; code is shown in figure 3

0 1 0 1 0 1 0 1
1 0 0 1 0 0 0 0 1 0 0 0 1 0
B1 −1 1C B1 −1 1 −1 1C B1 −1 1 −1C B1 −1C
B C B C B C B C
E4,3 =B
B1 1 1C ; A4 = B1 1 1 1 1C
C B C E4,4 =B
B 1 1 1 1 C ; E4,2 = B1 1C
C B C
@1 2 −1A @1 2 −1 −2 1A @1 2 −1 −2A @1 2 A
0 0 1 0 0 0 0 1 0 0 0 1 0 1
U = U2*Y^2 + U1*Y + U0 U = U3*Y^3 + U2*Y^2 + U1*Y + U0
V = V2*Y^2 + V1*Y + V0 V = V1*Y + V0

\\ Evaluation:10 add,2 shift;5 mul (n) \\ Evaluation:10 add,1 shift;5 mul

W3 = U2 + U0 ; W1 = V2 + V0 W3 = U2 + U0 ; W4 = V0 + V1
W4 = W1 + V1 W1 = U3 + U1
W0 = W3 + U1 W0 = W3 + W1
W2 = W0 * W4 W2 = W0 * W4
W3 = W3 - U1 ; W1 = W1 - V1 W3 = W3 - W1 ; W1 = V0 - V1
W0 =(W0 + U2)*2 ; W4 = V0 + V1*2 W0 =(U1 - U3)*2 ; W4 = W4 + V1
W0 = W0 - U0 ; W4 = W4 - V2 W0 = W0 + U0 - U2

W1 = W3 * W1 ; W3 = W0 * W4 W1 = W3 * W1 ; W3 = W0 * W4
W4 = U2 * V2 ; W0 = U0 * V0 W4 = U3 * V1 ; W0 = U0 * V0

\\ Interpolation: 7 add, 2 shift (2n)

W3 -= W1
W1 -= W2 ; W1 *= 2
W2 -= W0 + W4
W3 += W2 * 2
W2 -= W1
W1 -= W3
\\ Recomposition
W = W4*Y^4+ W3*Y^3+ W2*Y^2+ W1*Y + W0
W == U*V
Fig. 3. Toom-3 in F5 [x]

6.3 Toom-3.5 in F5 [x]

The unbalanced Toom-3.5 in F5 [x] is interesting for two reasons. First of all because it can be done
using only products and divisions by 2, with a sequence very similar to FFT; this relationship
Notes on low degree Toom-Cook multiplication with small characteristic 9

will be further investigated in §8. Moreover it has two versions, the slightly unbalanced (4 parts
versus 3) and the strongly one (5 parts versus 2), with quite interesting evaluation sequences:
in particular the 5 parts evaluation is only a slight modification of the 4 parts one, because the
matrix repeats.
The set of points used is P4 = {(0, 1), (1, 2), (1, −2), (1, −1), (1, 1), (1, 0)}. The code in figure 4
uses those points and the resulting matrices.
0 1 0 1 0 1 0 1
1 0 0 0 1 0 0 1 0 0 0 0 1 0
B1 2 −1 −2C B1 2 −1C B1 2 −1 −2 1C B1 2C
B C B C B C B C
B1 −2 −1 2C
C ; E5,3 = B1 −2 −1C
B1 −2 −1 2 1C
C ; E5,2 = B1 −2C
B C B C
E5,4 =B
B1 −1 1 −1C B1 −1 1C E5,4 =B
B1 −1 1 −1 1C B1 −1C
B C B C B C B C
@1 1 1 1 A @1 1 1A @1 1 1 1 1 A @1 1A
0 0 0 1 0 0 1 0 0 0 0 1 0 1
U = U3*Y^3 + U2*Y^2 + U1*Y + U0 U = U4*Y^4 + U3*Y^3 + U2*Y^2 + U1*Y + U0
V = V2*Y^2 + V1*Y + V0 V = V1*Y + V0
\\ Evaluation: 14 add, 2 shift; 6 mul \\ Evaluation: 13 add, 1 shift; 6 mul (n)
W0 = U0 + U4
W1 = U0 + U2 ; W5 = V0 + V2 W1 = W0 + U2
W3 = U1 + U3 W3 = U1 + U3
W2 = W1 - W3 ; W4 = W5 - V1 W2 = W1 - W3 ; W4 = V0 - V1
W1 = W1 + W3 ; W5 = W5 + V1 W1 = W1 + W3 ; W5 = V0 + V1
W3 = W2 * W4 ; W4 = W1 * W5 W3 = W2 * W4 ; W4 = W1 * W5

W2 =(U1 -
U3)*2; W0 = V0 + V1*2 W2 =(U1 -
U3)*2
W1 = U0 U2 -; W0 = W0 - V2 W1 = W0 U2- ; W0 = W5 + V1
W5 = W1 W2 - W5 = W1 W2-
W2 = W2 W1 + W2 = W2 W1+
W1 = W2 W0 * W1 = W2 W0*
; W0 = W0 + V1 ; W0 = W0 + V1
W2 = W5 * W0 W2 = W5 * W0
W5 = U3 * V2 ; W0 = U0 * V0 W5 = U4 * V1 ; W0 = U0 * V0

\\ Interpolation: 10 add, 4 shift (2n)

W4 =(W4 + W3)/2
W3 = W4 - W3
W1 = W1 - W2 0 1
W2 = W2 - W1*2 1 0 0 0 0 0
W4 =(W4 + W2)/2 B
B 1 2 −1 −2 1 2CC
W2 = W4 - W2 B 1 −2 −1 2 1 −2C
A5 = B C
W3 =(W3 + W1)/2 B
B 1 −1 1 −1 1 −1C
C
W1 = W3 - W1 @ 1 1 1 1 1 1A
W1 = W1 - W5 0 0 0 0 0 1
W4 = W4 - W0
\\ Recomposition
W = W5*Y^5 +W4*Y^4+ W3*Y^3+ W2*Y^2+ W1*Y + W0
W == U*V
Fig. 4. Toom-3.5 in F5 [x]

7 Results and Algorithms in Characteristic 7

The algorithms described in the following sections were studied to work in F7 [x], but can be
applied in general for characteristic 7. Again we skip Toom-2; and Toom-2.5 too, because they
10 Marco Bodrato (on-line version http: // bodrato. it/ papers/ #CIVV2007 )

work as in Z [x]. Only some specification for Toom-3 are given. It would be interesting to analyse
even the 4-way and the unbalanced Toom-4.5, where cancellations due to the relation x6 ≡ 1
should show up, but this work is left for future implementations.
To verify algorithms proposed in this section with GP/PARI, some lines defining coefficients
in F7 should be pre-inserted:

U0=u0Mod(1,7) ; U1=u1Mod(1,7) ; U2=u2Mod(1,7) ; U3=u3Mod(1,7)

V0=v0*Mod(1,7) ; V1=v1*Mod(1,7) ; V2=v2*Mod(1,7)

7.1 Toom-3 in F7 [x]

With the same choice of points as in Z, the determinant can be ±2, requiring one shift. Other
choices are possible, even with determinant 1, but no shorter ES where found.
We propose here the sequences for balanced and unbalanced operands with the usual set of
points: P4 = {(0, 1), (1, 2), (1, 1), (1, −1), (1, 0)}. The code and the matrices are given in figure 5.

0 1 0 1 0 1 0 1
1 0 0 1 0 0 0 0 1 0 1 0 0 0
B1 2 4 C B1 2 4 1 2C B1 2 C B1 2 4 1C
B C B C B C B C
E4,3 =B
B 1 1 1 C ; A4 = B1 1 1 1 1C
C B C E4,2 =B
B 1 1 C ; E4,4 = B1 1 1 1C
C B C
@1 −1 1A @1 −1 1 −1 1A @1 −1A @1 −1 1 −1A
0 0 1 0 0 0 0 1 0 1 0 0 0 1
U = U2*Y^2 + U1*Y + U0 U = U3*Y^3 + U2*Y^2 + U1*Y + U0
V = V2*Y^2 + V1*Y + V0 V = V1*Y + V0

\\ Evaluation: 10 add, 2 shift; 5 mul (n) \\ Evaluation: 10 add, 1 shift; 5 mul (n)
W1 = U2 + U0
W1 = U2 + U0 ; W3 = V2 + V0 W3 = U3 + U1
W0 = W1 + U1 ; W4 = W3 + V1 W0 = W1 + W3 ; W4 = V0 + V1
W2 = W0 * W4 W2 = W0 * W4

W0 =(W0 + U2)2 ; W4 =(W4 + V2)2 W1 = W1 - W3 ; W3 = V0 - V1

W0 = W0 - U0 ; W4 = W4 - V0 W0 =(W0 + U2)*2 ; W4 = W4 + V1
W1 = W1 - U1 ; W3 = W3 - V1 W0 = W0 - U0 -U3

W3 = W1 * W3 ; W1 = W0 * W4 W3 = W1 * W3 ; W1 = W0 * W4
W4 = U2 * V2 ; W0 = U0 * V0 W4 = U3 * V1 ; W0 = U0 * V0

\\ Interpolation: 8 add, 2 shift (2n)

W1 -= W2
W3 =(W2 - W3)/2
W2 = W2 - W0 - W3
W1 -= W2
W2 -= W4
W1 -= W2*2
W3 -= W1
\\ Recomposition
W = W4*Y^4+ W3*Y^3+ W2*Y^2+ W1*Y + W0
W == U*V
Fig. 5. Toom-3 in F7 [x]
Notes on low degree Toom-Cook multiplication with small characteristic 11

8 Toom and FFT interactions

Toom’s ideas for multiplication and the use of FFT for convolution [SS71] are tightly related.
Both are based on the “evaluate, multiply, interpolate” paradigm. When we work on finite fields,
the difference became even smaller. In the correct ring, Toom-2.5 and FFT-4 coincide, or, more
correctly, NTT-4 is one of the possible Toom-2.5 algorithms. The main difference is that to
multiply two polynomials in Fpn [x], Toom uses evaluations on the ring itself, while the FFT
approach always maps into some ring extension.

8.1 Toom-3.5 for Any Ring with Imaginary Unit i.

The inversion sequence described for Toom-3.5 in F5 [x], can be generalised to any characteristic
p ≡ 1 (mod 4), or more generally to any integral domain containing ±i, the square roots of
−1 = (±i)2 .
With the six evaluation points (0, 1, −1, i, −i, ∞), we obtain evaluation and interpolation
matrices whose central lines requires exactly those operations needed to compute the product
modulo (y 4 − 1) via FFT.
   
1 0 0 0 0 0 1 00000
 1 1 1 1 1 1  0 1 0 0 0 1
   
 1 −1 1 −1 1 −1  F F T −4  0 0 1 0 0 0
   
 1 i −1 −i 1 i  (mod y4 −1)  0 0 0 1 0 0
   
 1 −i −1 i 1 −i   1 0 0 0 1 0
0 0 0 0 0 1 0 00001

The same can be done if we have ω 4 = −1, (ω 2 ) = ±i and we use negacyclic convolution.
   
1 0 0 0 0 0 1 0 0 0 0 0
1
 ω ω 2 ω 3 −1 −ω  
 0 1 0 0 0 −1 
 
 1 −ω ω 2 −ω 3 −1 ω  F F T −4  0 0 1 0 0 0 
   
 1 ω 3 −ω 2 ω −1 ω 3  (mod y 4 +1)  0 0 0 1 0 0
   
 1 −ω 3 −ω 2 −ω −1 −ω 3   −1 0 0 0 1 0 
0 0 0 0 0 1 0 0 0 0 0 1
Finally the two 1 (respectively −1) can be easily cleared with two subtractions (additions).
The argument can be reversed, using Toom strategies to handle FFT worst cases. Assume
we need to compute the product c = ab where a, b ∈ R[x]. If deg(ab) = K − 1 and K is a
good value for FFT, we can compute c = ab = ab (mod xK ± 1) very fast, using K point-wise
multiplication. On the other hand if deg(ab) = K + 1 we have two possible choices: use FFT
with a bigger K 0 , or compute C = ab( mod xK ± 1) as above, then c0 = a(0)b(0) = a0 · b0
and cK+1 = a(∞)b(∞) = an · bm ; finally correct the coefficients and obtain cK = c0 ∓ C0 ,
c1 = C1 ± cK+1 , ci = Ci . Thus we need K + 2 point-wise multiplication to obtain K + 2
coefficients, overhead is negligible, while plain FFT would have required 2K or more.
This approach is quite similar to the Granlund’s trick described in [GKZ07], but generalises
it using polynomials. This allow us to rebuild correct coefficient from both (low and high) ends,
and not only from lowest bits. It be used only for the first step, not inside the recursions.
In general the FFT strategy [SS71] can be used to compute a total number of coefficients
that is a power of two, thus we can compare Toom-4.5 with FFT-8, and Toom-(2k + 1)/2 with
FFT-(2k ). The argument above allows also FFT-(2k +1) and FFT-(2k +2), so that Toom-3.5 is an
exemplified FFT-(4+2), and the algorithm in §5.2 implements FFT-(2+2), which is not FFT-4.
The generalization in F2 [x] uses power of three [Sch77], but again che parallelism works com-
paring Toom-5 with FFT-9, Toom-6 with FFT-(9+2), and so on.
12 Marco Bodrato (on-line version http: // bodrato. it/ papers/ #CIVV2007 )

9 Conclusions
We gave several hints on how to search for optimal evaluation and interpolation sequences for
Toom-Cook algorithms, also in the event that we need more evaluation points than we have in
the base-field.
The 3-way Toom-Cook algorithm was given in full details for all significant characteristics; for
bigger ones (eleven and more) the sequences described for integers in [BZ07] can be used.
A final trick to enhance the worst cases of the FFT multiplication strategies was suggested,
this too can be applied to every characteristic with a very small ovrhead.

Acknowledgements
The author wants to thank Marica Tarantino for her patience which made this work possible;
Alberto Zanoni, Paul Zimmermann, Jörg Arndt and Richard Brent for helpful discussions.

References
AHM07. Omran Ahmadi, Darrel Hankerson, and Alfred Menezes, Software implementation of arithmetic
in F3m , in Carlet and Sunar [CS07].
Arn07. Jörg Arndt, Algorithms for programmers (working title), http://www.jjj.de/fxt/, June 2007,
Draft version.
Bod07. Marco Bodrato, Towards optimal Toom-Cook multiplication for univariate and multivariate
polynomials in characteristic 2 and 0, in Carlet and Sunar [CS07], http://bodrato.it/papers/
#WAIFI2007, pp. 116–133.
Bro07. Christopher W. Brown (ed.), ISSAC 2007 - International Symposium on Symbolic and Algebraic
Computation, Waterloo, Ontario, Canada, July 29-August 1, 2007, ACM press, July 2007.
BZ07. Marco Bodrato and Alberto Zanoni, Integer and polynomial multiplication: Towards optimal
Toom-Cook matrices, in Brown [Bro07], http://bodrato.it/papers/#ISSAC2007.
Coo66. Stephen A. Cook, On the minimum computation time of functions, Ph.D. thesis, Dept. of
Mathematics, Harvard University, 1966.
CS07. Claude Carlet and Berk Sunar (eds.), WAIFI 2007 - International Workshop on the Arithmetic
of Finite fields, Madrid, Spain, June 21-22, 2007, Lecture Notes in Computer Science, vol. 4547,
Springer, June 2007.
GKZ07. Pierrick Gaudry, Alexander Kruppa, and Paul Zimmermann, A GMP-based implementation of
Schnhage-Strassen’s large integer multiplication algorithm, in Brown [Bro07].
Jeb93. Tudor Jebelean, An algorithm for exact division, Journal of Symbolic Computation 15 (1993),
169–180.
KO62. Anatolii Alexeeviq Karacuba and ri Ofman, Umnoenie mnogoznaqnyh qisel na
avtomatah, Doklady Akademii Nauk SSSR 145 (1962), no. 2, 293–294, english translation
in [KO63].
KO63. Anatoly Alexeevich Karatsuba and Yuri Ofman, Multiplication of multidigit numbers on au-
tomata, Soviet Physics Doklady 7 (1963), 595–596.
Mon85. Peter L. Montgomery, Modular multiplication without trial division, Mathematics of Computa-
tion 44 (1985), no. 170, 519–521.
SS71. Arnold Schönhage, Volker Strassen, Schnelle Multiplikation großer Zahlen, Computing 7 (1971),
281–292.
Sch77. Arnold Schönhage, Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2,
Acta Informatica 7 (1977), 395–398.
Too63a. Andre ui Leonoviq Toom, O slonosti shemy iz funkcionalynyh zlementov, real-
iziruwe ui umnoenie celyh qisel, Doklady Akademii Nauk SSSR 150 (1963), no. 3,
496–498 (russian), english translation in [Too63b].
Too63b. Andrei Leonovich Toom, The complexity of a scheme of functional elements realizing the mul-
tiplication of integers, Soviet Mathematics Doklady 3 (1963), 714–716.
ZQ03. Paul Zimmermann and Michael Quercia, irred-ntl source code, 2003, http://www.loria.fr/
~zimmerma/irred/.
Notes on low degree Toom-Cook multiplication with small characteristic 13

A Exact Division
The cost for exact division by a fixed element of the ring field is linear on degree of the dividend.
The first reference the author found for a proof of this fact is the paper by Jebelean [Jeb93]. Code
by Quercia [ZQ03] for exact division by 1+x2 in F2 [x] gave the author the ideas for generalisation
to polynomial rings. Finally Arndt description of inversion for series [Arn07] gave hints for a clean
explanation of the procedure.

A.1 Exact Division by 1 ± xn

The divisor we most frequently have to divide for is D = 1 ± xn , so in this section we assume
we have an element A ∈ A [x] of any polynomial ring which is known to be a multiple of our
divisor D. Then we know the quotient Q ∈ A [x] exist so that A = Q · D. We want to compute
the element Q.
We can at first compute the degree d = deg(A) of the polynomial A, this immediately gives
us the degree of Q, deg(Q) = d − n.
To compute Q we start computing A0 , then we recourse:

A0 ← A(1 ∓ xn ) = Q(1 ± xn )(1 ∓ xn ) = Q(1 − x2n )

A1 ← A0 (1 + x2n ) = Q(1 − x2n )(1 + x2n ) = Q(1 − x4n )

...
k k+1
Ak ← Ak−1 (1 + xn·2 ) = Q(1 + xn·2 ) (3)
k+1 d−n
until n · 2 > d − n, then Ak ≡ Q (mod x ) and we can easily extract Q from the lowest
coefficient of Ak .
Since we need the result modulo xd−n , all the computations can be performed modulo that
polynomial. Each step requires then a shift and an addition of at most d elements in A. The
number of necessary steps is 1 + blog2 d−nn c = O(log2 d).
Another possible way to clear the factor 1 ± xn from A = B(1 ± xn ) is to compute A e0 =
n 2n n n 2n 3n k+1
A(1 ∓ x + x ) = Q(1 ± x )(1 ∓ x + x ) = Q(1 ± x ), and recursively A ek until n · 3 > d − n.
Which is usually slower, because one step requires two additions and gives only a factor 3 for the
exponent. One or two steps of this computation can have some use anyway, whenever a factor 2
is not enough while a 3 is, or 8 is not and 9 is. As an example:

(1 − x3 )−1 ≡ (1 + x3 )(1 + x6 )(1 + x12 )(1 + x24 ) ≡ (1 + x3 + x6 )(1 + x9 )(1 + x18 ) (mod x32 )

in both cases an additional factor, resp. (1 + x48 ) or (1 + x36 ), gives the inverse modulo x64 .

A.2 Exact Division with Linear Complexity

Assume we have a representation for the P finite field where each CPU word stores w coefficients,
n
and a0 , . . . , an are the words so that A = i=0 ai xiw and ∀ i , deg(ai ) < w. Moreover we assume
w
that the divisor D is invertible modulo x . Then we can implement exact division with linear
complexity; the algorithm follows.

for
i = 0 . . . n −1
ai ← ai · D (mod xw )
ai+1 ← ai+1 − ai ·D

xw
When D = 1 ± xn , multiplication by the inverse D−1 can be optimised as above, and
ai · D ai ai · xn ai
w
= w
± w
= ± w−n = ±ai (w − n)
x x x x
14 Marco Bodrato (on-line version http: // bodrato. it/ papers/ #CIVV2007 )

reduces to one shift operation by (w − n) positions.

The very last step an+1 ← an+1 − an (w − n) should be avoided. Anyway it must not have
any effect at all. If division is exact we must have an (w − n) = 0.

A.3 Exact Division with Same Cost of an Addition

The above algorithm greatly simplifies if D = 1 ± xkw ⇒ D−1 ≡ 1 (mod xw ); the loop reduces to:

for i = 0 . . . (n − k)
| ai+k ← ai+k ∓ ai
This requires 1 word addition for each word. The last k steps of the previous loop should all
give zero, they can be used for check or simply skipped, because we started from a multiple A
of D. With (n + 1) words, only (n + 1 − 2k) steps are necessary; 2k less than those needed for
an addition, and here we just read/write one operand. The cost of this division can be even less
than the cost of an addition.

B Toom-k Complexity
Asymptotic complexity of Toom-k algorithm to multiply two degree-d polynomials has been
analysed many times. Here we recall and reorganise the results, with the aim of computing the
number of operations tightly, showing the influence of the optimisations in the linear part on the
usually implied constant.
We will call M (d) the number of operations needed to multiply two polynomials with d
coefficients. All Toom-k algorithms require some linear algebra, for evaluation and interpolation
phases; all operations used in those steps are linear in the number of coefficients d, so we will
condense the cost of all them in ck d, with a constant ck depending on the specific Toom-k method.
One step of a multiplication method splitting operands in k parts, and requiring m sub-
products and ck linear operation, will cost

d
Mk (d) = m · M + ck · d
k
operations. We will use the well known relations
n
logb β logb α
X αn+1 − 1
α =β , αi = ,
i=0
α−1

to compute the cost of r recursions of the same algorithm:

r−1
(m/k)r − 1

r r d X m i r d
Mk (d) = m · M + d · ck = m · M + d · ck
kr i=0
k kr m/k − 1
d k m r d k d·mr k
r
r

=m · M kr + d · ck m−k k − 1 = m · M kr + ck m−k kr − d · ck m−k
d d k k
= mr M + r ck − d · ck .
kr k m−k m−k
Then we analyse the asymptotic behaviour of Toom-k algorithm. Let m = 2k − 1, and assume
to use all possible recursions r = logk d, then the in the above formula mr = dlogk m , k r = d,
k/(m − k) = k/(k − 1):

k k
Mk (d) = dlogk m M (1) + ck − d · ck = Θ(dlogk (2k−1) ) .
k−1 k−1
THIS PAGE IS NOT PART OF THE ARTICLE2
BibTEX entry

@TechReport{Bodrato:CIVV2007,
author = {Marco Bodrato},
title = {Notes on Low Degree {Toom-Cook} Multiplication with Small Characteristic},
institution ={Centro "Vito Volterra", Universit\‘a di Roma "Tor Vergata"},
year = {2007},
number = {621},
pages = {14},
month = {December},
note = {\url{http://bodrato.it/papers/\#CIVV2007}},
}

2
Paper edited with Emacs-21 and LATEX-3.0 on a Debian GNU/Linux box.

Fast Fourier Transforms
No ratings yet
Fast Fourier Transforms
17 pages
Finite Field Polynomial Evaluation
No ratings yet
Finite Field Polynomial Evaluation
12 pages
Karatsuba Algorithm for Fast Integer Multiplication
No ratings yet
Karatsuba Algorithm for Fast Integer Multiplication
17 pages
Toom-Cook Algorithm for Large Integer Multiplication
No ratings yet
Toom-Cook Algorithm for Large Integer Multiplication
10 pages
Polynomials and Fast Fourier Transform
No ratings yet
Polynomials and Fast Fourier Transform
17 pages
Karatsuba PDF
No ratings yet
Karatsuba PDF
8 pages
Convolution and Fast Fourier Transform
No ratings yet
Convolution and Fast Fourier Transform
14 pages
COMS 6998 Lec 1
No ratings yet
COMS 6998 Lec 1
8 pages
A Polys and NT Holden Lee Lecture 13 E
No ratings yet
A Polys and NT Holden Lee Lecture 13 E
15 pages
Galois
No ratings yet
Galois
8 pages
Lect 06
No ratings yet
Lect 06
6 pages
Slides06 FFT
No ratings yet
Slides06 FFT
60 pages
Week 5
No ratings yet
Week 5
28 pages
Polynomial Multiplication Techniques
No ratings yet
Polynomial Multiplication Techniques
2 pages
Fast Convolution Techniques Explained
No ratings yet
Fast Convolution Techniques Explained
50 pages
Fast Convolution Cook Toom Algorithm
No ratings yet
Fast Convolution Cook Toom Algorithm
24 pages
Solns
No ratings yet
Solns
38 pages
Time Complexity of Base Conversion
No ratings yet
Time Complexity of Base Conversion
60 pages
Polynomial Multiplication Via Fast Fourier Transforms
No ratings yet
Polynomial Multiplication Via Fast Fourier Transforms
7 pages
Tutorial3 Adv Writeup
No ratings yet
Tutorial3 Adv Writeup
7 pages
Integer Arithmetic for CS Students
No ratings yet
Integer Arithmetic for CS Students
19 pages
Divide and Conquer Strategy
No ratings yet
Divide and Conquer Strategy
10 pages
FFT-Based Polynomial Multiplication
No ratings yet
FFT-Based Polynomial Multiplication
22 pages
537 Overview
No ratings yet
537 Overview
12 pages
Polynomial Multiplication via FFT
No ratings yet
Polynomial Multiplication via FFT
28 pages
Polynomials
No ratings yet
Polynomials
22 pages
Cook Toom Algorithm
No ratings yet
Cook Toom Algorithm
27 pages
Int Mult 08
No ratings yet
Int Mult 08
13 pages
Fast Multiplication of The Algebraic Normal Forms of Two Boolean Functions
No ratings yet
Fast Multiplication of The Algebraic Normal Forms of Two Boolean Functions
12 pages
5 FFT NP Randomized
No ratings yet
5 FFT NP Randomized
72 pages
Solving Structured Linear Systems With Large Displacement Rank
No ratings yet
Solving Structured Linear Systems With Large Displacement Rank
27 pages
FFT and Algebraic Computation
No ratings yet
FFT and Algebraic Computation
29 pages
5CS3-01: Information Theory & Coding: Unit-4 Cyclic Code
No ratings yet
5CS3-01: Information Theory & Coding: Unit-4 Cyclic Code
67 pages
Understanding Finite Fields and Their Properties
No ratings yet
Understanding Finite Fields and Their Properties
9 pages
Convulation
No ratings yet
Convulation
6 pages
Ziming Li. - Lecture Notes On Computer Algebra (2004)
No ratings yet
Ziming Li. - Lecture Notes On Computer Algebra (2004)
31 pages
Binary Galois Field Table Representation Using C++
No ratings yet
Binary Galois Field Table Representation Using C++
13 pages
12 Polyalg Bounds PDF
No ratings yet
12 Polyalg Bounds PDF
16 pages
Foundations Computational Mathematics: Quadratic Newton Iteration For Systems With Multiplicity
No ratings yet
Foundations Computational Mathematics: Quadratic Newton Iteration For Systems With Multiplicity
47 pages
Resultants and Galois Groups in Math 250a
No ratings yet
Resultants and Galois Groups in Math 250a
13 pages
Discrete Mathematics
No ratings yet
Discrete Mathematics
223 pages
Cyclic Codes: Exercises for Students
No ratings yet
Cyclic Codes: Exercises for Students
33 pages
Fast Integer Multiplication via FFT
No ratings yet
Fast Integer Multiplication via FFT
12 pages
Computational Number Theory Lecture Notes
No ratings yet
Computational Number Theory Lecture Notes
70 pages
Poly PDF
No ratings yet
Poly PDF
33 pages
Alliance
No ratings yet
Alliance
11 pages
Galois Fields & Reed-Solomon Basics
No ratings yet
Galois Fields & Reed-Solomon Basics
16 pages
Number Theory
No ratings yet
Number Theory
85 pages
Polynomials and The FFT: Chapter 32 in CLR Chapter 30 in CLRS
No ratings yet
Polynomials and The FFT: Chapter 32 in CLR Chapter 30 in CLRS
34 pages
MorePolynomialRingsOverField ANSWERS
No ratings yet
MorePolynomialRingsOverField ANSWERS
4 pages
Algebraic Dynamics in PRNGs
No ratings yet
Algebraic Dynamics in PRNGs
22 pages
Zeta Function Algorithm via Polynomials
No ratings yet
Zeta Function Algorithm via Polynomials
13 pages
Product Bases and Endomorphisms of Products of Integers
No ratings yet
Product Bases and Endomorphisms of Products of Integers
16 pages
Fast Fourier Transform Homework
No ratings yet
Fast Fourier Transform Homework
7 pages
Properties of α1 and α2 Near Rings
No ratings yet
Properties of α1 and α2 Near Rings
9 pages
Using Crooked Lines For The Higher Accuracy in System of Integral Equations
No ratings yet
Using Crooked Lines For The Higher Accuracy in System of Integral Equations
15 pages
The Riemann Zeta-Function
No ratings yet
The Riemann Zeta-Function
11 pages
Icosahedral Symmetry and The Quintic: Equation
100% (1)
Icosahedral Symmetry and The Quintic: Equation
16 pages
Generalization of Rischs Algorithm To Special Functions PDF
No ratings yet
Generalization of Rischs Algorithm To Special Functions PDF
19 pages
Closed-Form Integration of Functions
100% (1)
Closed-Form Integration of Functions
9 pages
Faster Multiplication in GF (2) (X)
No ratings yet
Faster Multiplication in GF (2) (X)
14 pages
Understanding Number Systems in Mathematics
100% (1)
Understanding Number Systems in Mathematics
5 pages
Blind Source Separation
No ratings yet
Blind Source Separation
64 pages
Session PDF Polynomials Class 9 Mathematics NCERT Score Booster
No ratings yet
Session PDF Polynomials Class 9 Mathematics NCERT Score Booster
34 pages
JEE-Advanced-study-material Free Resources
No ratings yet
JEE-Advanced-study-material Free Resources
1 page
MATLAB Examples - Interpolation
No ratings yet
MATLAB Examples - Interpolation
8 pages
Bead Motion on Rigid Wire: Equations
No ratings yet
Bead Motion on Rigid Wire: Equations
11 pages
Lecture Notes 3 - Scalar and Vector Product
No ratings yet
Lecture Notes 3 - Scalar and Vector Product
4 pages
GCE Mathematics Marking Scheme 2012
No ratings yet
GCE Mathematics Marking Scheme 2012
71 pages
MTH301 (Orange Monkey)
No ratings yet
MTH301 (Orange Monkey)
57 pages
Operations Research Lab Report
No ratings yet
Operations Research Lab Report
3 pages
Mid. 1 Semester
No ratings yet
Mid. 1 Semester
5 pages
Assignment (Ch-Determinants) PDF
No ratings yet
Assignment (Ch-Determinants) PDF
8 pages
Trigonometry MCQs with Answers
No ratings yet
Trigonometry MCQs with Answers
6 pages
M. G. Cojocaru National Institute For Aerospace Research "Elie Carafoli" - INCAS Presented By: Dr. M. L. Niculescu
No ratings yet
M. G. Cojocaru National Institute For Aerospace Research "Elie Carafoli" - INCAS Presented By: Dr. M. L. Niculescu
22 pages
2025PTS01
No ratings yet
2025PTS01
7 pages
Maths
No ratings yet
Maths
42 pages
mt1173 Vle PDF
No ratings yet
mt1173 Vle PDF
287 pages
Republic of The Philippines
No ratings yet
Republic of The Philippines
2 pages
Mathematics I Course Overview
No ratings yet
Mathematics I Course Overview
15 pages
Steps To Graph Rational Functions
No ratings yet
Steps To Graph Rational Functions
2 pages
Polynomial Arithmetic in Finite Fields
No ratings yet
Polynomial Arithmetic in Finite Fields
25 pages
Sla - 20bsma304 - Question Bank Updated For All Units
No ratings yet
Sla - 20bsma304 - Question Bank Updated For All Units
15 pages
Linear State Feedback Control Design
No ratings yet
Linear State Feedback Control Design
18 pages
8th Math Curriculum Map
100% (1)
8th Math Curriculum Map
7 pages
Mathematical Functions and Curves Analysis
0% (1)
Mathematical Functions and Curves Analysis
4 pages
Advanced Quadrature Techniques
No ratings yet
Advanced Quadrature Techniques
11 pages
1 Algorithm: Design: Indian Institute of Information Technology Design and Manufacturing, Kancheepuram
No ratings yet
1 Algorithm: Design: Indian Institute of Information Technology Design and Manufacturing, Kancheepuram
13 pages
Introduction to Stochastic Calculus
No ratings yet
Introduction to Stochastic Calculus
89 pages
Algebra II Test 2
No ratings yet
Algebra II Test 2
5 pages
Linear Inequalities
100% (1)
Linear Inequalities
32 pages

Polynomial Multiplication in Small Fields

Uploaded by

Polynomial Multiplication in Small Fields

Uploaded by

Notes on Low Degree Toom-Cook Multiplication with

Centro Interdipartimentle “Vito Volterra” – Università di Roma Tor Vergata

Abstract. The use of Toom-Cook sub-quadratic polynomial multiplication was recently

Keywords: Polynomial multiplication, finite fields, Toom-Cook, Karatsuba, optimal, con-

2 Toom-Cook Algorithm for Polynomials

Traditionally the Toom-n algorithm requires balanced operands so that m = n, but we

Recursive multiplication : We compute ∀ i, w(αi , βi ) = u(αi , βi ) · v(αi , βi ), with d + 1 mul-

3 Searching for Optimality

3.1 Linear Combinations

3.2 Cost of Basic Operations

4.1 Matrices for the Interpolation Sequence

4.2 The Choice of Evaluation Points

4.3 Matrices for the Evaluation Sequence

4.4 Unusual Evaluations

v(α, β)u(β −1 , α−1 ) = (αβ)−n+1 w(α, β)

∀(αv , βv , αu , βu , γ, δ) ∈ R6 αv βu = αu βv ⇒ ∃λ : γv(αv , βv ) · δu(αu , βu ) = λw(αv , βv )

5 Results and Algorithms in Characteristic 3

5.1 Choosing Evaluation/Interpolation Points

5.2 Toom-2.5 in F3 [x]

U = U2*X^2 + U1*X*Y + U0*Y^2

5.3 Toom-3 in F3 [x]

W3 = (W0 - W3)/x + W1 + W2*x + W4*x^3

with a new sequence requiring the same number of operations.

\\ Evaluation:10 add,4 shift;5 mul (n) \\ Evaluation:10 add,4 shift;5 mul

W0 = U0 +(U1 + U2*x)*x W0 = U0 +(U1 +(U2 +U3*x)*x)*x

\\ Interpolation: 9 add, 3 shift, 1 Sdiv

6 Results and Algorithms in Characteristic 5

U0=u0*Mod(1,5); U1=u1*Mod(1,5); U2=u2*Mod(1,5); U3=u3*Mod(1,5); U4=u4*Mod(1,5)

6.1 Toom-2.5 in F5 [x], FFT-like

6.2 Toom-3 in F5 [x]

\\ Evaluation:10 add,2 shift;5 mul (n) \\ Evaluation:10 add,1 shift;5 mul

\\ Interpolation: 7 add, 2 shift (2n)

6.3 Toom-3.5 in F5 [x]

\\ Interpolation: 10 add, 4 shift (2n)

7 Results and Algorithms in Characteristic 7

U0=u0*Mod(1,7) ; U1=u1*Mod(1,7) ; U2=u2*Mod(1,7) ; U3=u3*Mod(1,7)

7.1 Toom-3 in F7 [x]

W0 =(W0 + U2)*2 ; W4 =(W4 + V2)*2 W1 = W1 - W3 ; W3 = V0 - V1

\\ Interpolation: 8 add, 2 shift (2n)

8 Toom and FFT interactions

8.1 Toom-3.5 for Any Ring with Imaginary Unit i.

A.1 Exact Division by 1 ± xn

A0 ← A(1 ∓ xn ) = Q(1 ± xn )(1 ∓ xn ) = Q(1 − x2n )

A1 ← A0 (1 + x2n ) = Q(1 − x2n )(1 + x2n ) = Q(1 − x4n )

A.2 Exact Division with Linear Complexity

reduces to one shift operation by (w − n) positions.

A.3 Exact Division with Same Cost of an Addition

to compute the cost of r recursions of the same algorithm:

You might also like

U = U2X^2 + U1XY + U0Y^2

W3 = (W0 - W3)/x + W1 + W2x + W4x^3

W0 = U0 +(U1 + U2x)x W0 = U0 +(U1 +(U2 +U3x)x)*x

U0=u0Mod(1,5); U1=u1Mod(1,5); U2=u2Mod(1,5); U3=u3Mod(1,5); U4=u4*Mod(1,5)

U0=u0Mod(1,7) ; U1=u1Mod(1,7) ; U2=u2Mod(1,7) ; U3=u3Mod(1,7)

W0 =(W0 + U2)2 ; W4 =(W4 + V2)2 W1 = W1 - W3 ; W3 = V0 - V1