Code Generators For Mathematical Functions
Code Generators For Mathematical Functions
net/publication/281948277
CITATIONS READS
17 609
4 authors, including:
Olga Kupriianova
Ecole normale supérieure de Lyon
7 PUBLICATIONS 104 CITATIONS
SEE PROFILE
All content following this page was uploaded by Olga Kupriianova on 05 November 2015.
Abstract—A typical floating-point environment includes sup- most of its time in the evaluation of elementary functions [4].
port for a small set of about 30 mathematical functions such The same holds for large-scale simulation and analysis code
as exponential, logarithms and trigonometric functions. These run at CERN [5]–[7].
functions are provided by mathematical software libraries (libm),
typically in IEEE754 single, double and quad precision. The problem is that the optimal implementation of each
This article suggests to replace this libm paradigm by a more function is very dependent on the technology. Mathemati-
general approach: the on-demand generation of numerical func- cal support is provided by a combination of hardware and
tion code, on arbitrary domains and with arbitrary accuracies. software, and the optimal repartition between hardware and
First, such code generation opens up the libm function space software has evolved with time [8]. For instance, the first
available to programmers. It may capture a much wider set
of functions, and may capture even standard functions on non- 80287 mathematical coprocessor, in 1985, included support
standard domains and accuracy/performance points. for a range of elementary functions (albeit in microcode)
Second, writing libm code requires fine-tuned instruction selec- in addition to the basic operations. It was later found that
tion and scheduling for performance, and sophisticated floating- software could outperform such microcode. For instance, as
point techniques for accuracy. Automating this task through code memory got cheaper, large tables of precomputed values [9],
generation improves confidence in the code while enabling better
design space exploration, and therefore better time to market, [10] could be used to speed-up the computation of a func-
even for the libm functions. tion. Furthermore, progress in compiler technology allowed
This article discusses, with examples, the new challenges of for a deeper integration of elementary functions software
this paradigm shift, and presents the current state of open-source in the overall compilation process [11]–[13], another case
function code generators. for software-based functions. Today, even the relevance of
Index Terms—elementary functions, code generator, range hardware division and square root is disputed. However, at
reduction, reconstruction, interpolation. the same time, the table-based algorithms of the 90s are being
replaced with high-degree polynomials [12] that behave better
I. I NTRODUCTION & M OTIVATION in the context of current highly parallel, memory-constrained
multicores.
A. Standard Mathematical Libraries
The standard mathematical library (libm) provides, in a C. Limits of the library approach
small set of precisions (simple and double precision), a small
1) Productivity of libm development: Writing a libm re-
set of mathematical functions:
quires fine-tuned instruction selection and scheduling for
• elementary functions [1] such as exponential and loga-
performance, and sophisticated floating-point techniques for
rithm, sine, cosine, tangent and their inverses, hyperbolic accuracy. Re-optimization of these mutually dependent goals
functions and their inverses; for each new processor is a time-consuming and error-prone
• and other “special” functions useful in various computing
task. Besides, it is desirable that each function comes in several
contexts, such as the power function xy , erf, Γ or the Airy variants corresponding to a range of constraints on perfor-
function [2]. mance (e.g. optimized for throughput or optimized for latency)
Strictly speaking, the libm is specified in the C language or accuracy. Some processor and system manufacturers (Intel,
standard (currently ISO/IEC 9899:2011). Among the other AMD, ARM, NVIDIA, HP, Apple) therefore employ large
languages, some (C++, Fortran, Python) use the same library, teams of engineers dedicated to libm maintenance.
and some redefine it in a more or less compatible way. The With more limited manpower, the open-source mathematical
2008 revision of the IEEE 754 floating-point standard [3] libraries (most notably in the GNU glibc [14] and in
has attempted a common standardization. In addition, lan- Newlib1 ) lag behind in performance, all the more as they
guage standards also specify elementary functions on complex have to support a much wider range of processors. However,
numbers, but for historical reasons not in the “mathematical the most accurate implementations are found in open-source
library” section. Also, many operating systems or libraries efforts, with several correctly rounded functions provided by
offer, for convenience, more functions than strictly required libraries such as IBM LibUltim [15] (now in the glibc) and
by the libm. In this work we understand the term “libm” in more recently CRLibm [16].
its widest sense and address all these functions. 2) Limited choice: Another problem is that the libm
functions do not match all the needs of users.
B. Performance of a libm First, the choice is limited. For instance, the web
page http://project-mathlibs.web.cern.ch/project-mathlibs/
Performance of libm functions is of critical importance, in mathTable.html mentions all the functions that are used by
particular in scientific and financial computing. For example,
profiling the SPICE electronic simulator shows that it spends 1 https://sourceware.org/newlib/
2
CERN simulations. Only a fraction of this list is offered by success is that the generated code is better than whatever other
the standard libm. The rest is programmed when needed, approach the programmer would have to use (composition
sometimes by programmers who lack the expertise of libm of libm function, numerical integration if the function is
developers. This leads to duplicated efforts and, often, poorer defined by an integral, etc). “Better” may mean faster, or more
quality (performance- or accuracy-wise) than libm function. accurate, or better behaved in corner cases, etc.
Second, even the functions in the libm are often too good Ideally, we wish we could have a generator where we have a
(and therefore slower than they could be) for an application. clear separation (as on Figure 1) between a front-end building
In recent standards, range is specified as “as far as the input an approximation scheme, and a back-end implementing it on
and output formats allow”, while accuracy is specified as a given target technology. A first challenge is that the front-
“as accurate as the output format allows”. This is the best end itself must be directed by the target context. This is easy
possible specification for a generic library function, but is to understand with table-based range reduction techniques,
overkill for many applications. If a programmer knows that the for instance: their relevance and efficiency greatly depents on
input to a cosine will remain within a period, and that about the context. Formally capturing this complexity is currently
three decimal digits of accuracy will be enough considering out of reach. Fortunately, it is always possible to perform an
the accuracy of the inputs, the libm cosf function is too empirical exploration of the parameter space: generate several
accurate (7 decimal digits) on too large a range. variants, compile and time then, and pick up the best.
The present article claims that the solution to both previous We believe that the techniques presented in Section III can
issues (productivity, and limited choice) is to automate libm eventually be extended to capture all the functions of C11 on
development so that functions can be generated on-demand for their whole range. However, it is currently not the case. There
a wider set of contexts. This solution has already proven effec- is a lot of human expertise in libm development that we
tive for other classes of mathematical software, most notably are not yet able to automate. In particular, bivariate functions
the ATLAS [17] project for linear algebra, and the FFTW [18] such as atan2 or pow, as well as some special functions,
and SPIRAL [19] projects for fast Fourier transforms. are currently out of reach. The second use case therefore
focuses on assisting people who have this expertise, not yet
D. Use cases for generators of mathematical functions on replacing them. It targets a much narrower audience of
We are currently focusing on two main use cases where we programmers, those in charge of providing the actual libm
need a libm generator, both illustrated by Figure 1. functionality to an operating system or compiler. Here the
The first one targets the widest audience of programmers. criterion of success is that the generated code is of comparable
It is a push-button approach that will try to generate code quality to hand-written code, but obtained much faster.
on a given domain and for a given precision for an arbitrary The chosen approach, reviewed in Section IV, consists in
univariate function with continuous derivatives up to some giving to the libm developer the keys to the back-end. We
order. The function may be provided as a mathematical thus aim at offering him/her a development framework in
expression, or even as an external library that is used as a which he/she may describe the relevant evaluation schemes.
black box. We call this approach the open-ended approach, in The challenge here is to find a proper balance between
the sense that the function that can be input to the generator is two incompatible goals: 1/ This framework should be able
arbitrary – which does not mean that the generator will always to capture every trick that libm developers use, otherwise
succeed in handling it. Section III will describe how this they will not adopt it. 2/ It should nevertheless raise the
generator has evolved from simple polynomial approximations level of abstraction, so that a single description generates
to the generation of more sophisticated evaluations schemes, code for the variety of code flavors and targets we want to
including attempts to range reduction. Here, the criterion of address. And it should be fully scriptable to enable design-
3
space exploration, but always under full control of the libm to be implemented. In both cases the approximation problem
developer. Section IV reviews a technical solution that matches is reduced to approximating a (possibly different) function fr
these goals. on a smaller domain.
This second use case can be viewed as a pragmatic, bottom- 1) Domain splitting: The main idea here is to partition
up approach, where we embed existing hand-crafted code in I into subdomains, so that on each of them we can find
a framework to make it more generic. The first, open-ended a minimax polynomial approximation of small degree. The
use case is more ambitious, more high-level, and top-down splitting can be uniform, logarithmic [21], or arbitrary [22].
from the most abstract mathematical description of a function. In the two first cases the polynomial is selected based on a
These two approaches do meet as illustrated on Figure 1, and few bits from the input x. In the last case, several if-else
we do not claim that there is a clear border between them. For statements are needed.
instance, the libm developer working directly with the back- 2) Function-specific argument reduction: For some ele-
end will nevertheless invoke the evaluation scheme generator, mentary functions, specific algorithms of argument reduction
to solve sub-problems, typically after range reduction. may be applied [1], [23]–[25]. They are based on mathematical
properties such as bx ·by = bx+y . For example, the exponential
II. BACKGROUND IN E LEMENTARY F UNCTION function can be computed as follows [23]
I MPLEMENTATION
ex = 2E · 2k−E · 2x log2 e−k ,
In order to understand the challenge of writing a code
generator for mathematical functions, it is useful to know with a parameter w ∈ N, k = b2w x log2 ec2−w and E = bkc.
how they are implemented manually. To evaluate a function Thus, E ∈ Z, k ∈ 2−w Z.
at some point, we may use a finite number of additions, mul- Finally, taking a small r = x log2 e − k, we get
tiplications, floating-point number manipulations,comparisons
ex = 2E 2k−E er ,
and precomputed tables. A generic technique along these rules
is to approximate a function by tables and polynomials or where the values of 2k−E are stored in a table of size 2w .
piecewise polynomials. After filtering out special values like Finally, the task is reduced to an approximation of er , where
NaNs and infinities [3], there are usually three steps: argument r takes values from a small range. This approximation can be
reduction, polynomial approximation and reconstruction [1]. computed with one polynomial or with an additional splitting
The following details these steps. if needed.
among all the polynomials of a given degree d. Remez it provides state-of-the-art polynomial approximation [20] and
I
algorithm converges to a minimax polynomial. However, this safe algorithms to compute εapprox = kf − pk∞ [27]. It also
algorithm produces real coefficients, which may not be rep- provides a scripting language.
resentable with finite-precision numbers. Rounding these co- Gappa [16] is a formal proof assistant that is able to manage
efficients to floating-point numbers entails a loss of accuracy. the accumulation of floating-point errors in most of libm
This problem is solved in [20] by a modified Remez algorithm codes. Compared to [16], in the present work the Gappa proof
that finds a minimax-like polynomial among polynomials with scripts are not written by hand, but generated along with the
floating-point coefficients. C code. Interestingly, Gappa is itself a generator (it generates
These algorithms input f , I and the degree d, and return Coq or HOL formal proofs).
p and εapprox . For a given f and I, the larger the degree
d, the better the approximation (the smaller εappr ). When III. A PPROXIMATION TECHNIQUES FOR BLACKBOX
implementing a libm-like mathematical function, we have FUNCTIONS
an upper bound constraint on εappr , and we look for the A. Overview
smallest d that satisfies this bound. This approximation error -
This section describes an open-source code generator writ-
driven decision of finding d is slightly harder to compute then
ten in Sollya. It inputs a parametrization file and produces
classical polynomial approximation.
corresponding C code. The parametrization file includes the
function f , its implementation domain I, the desired accuracy
B. Argument reduction ε̄, maximum degree dmax for approximation polynomials, etc.
When it is not possible to compute a polynomial of a small The function may be specified as a mathematical expression,
enough degree, the implementation domain I may be reduced. or an external library. An important feature of this approach
There are two basic techniques for this: split the domain into is that the tool does not need an explicit formula for the
smaller subdomains, or use specific properties of the function specified function. It only accesses the function as a numerical
4
blackbox. All the tool requires from this blackbox is to be able x ∈ I. Only in this case does it check that the property is true
to return an arbitrarily accurate numerical evaluation of the to the required accuracy, by computing
function and its first few derivatives over interval. This way, I
bx
the tool will work for functions described not only as closed- ε̃ = −1
form expressions, but also as integrals, inverse functions, etc. f (x) ∞
It should however be reasonably fast to evaluate. and checking if ε̃ ≤ ε̄.
This blackbox encapsulation of the function will not pre- Other properties can be detected in the same way. However,
vent the tool to exploit the function-specific range reductions handling the error bound ε̃ requires some analysis for certain
presented in Section II-B. The following will show that the function properties.
required mathematical properties can be detected and exploited
from a blackbox function. C. Reconstruction
The tool also supports higher-than-double accuracy, using
The final step in mathematical functions implementation
double-double and triple-double arithmetic.
is reconstruction, i.e. the sequence of operators (or just a
formula) needed to be executed to evaluate the function at
B. Different levels of code generation some point x from the initial domain I. The reconstruction
may be tricky if the generation was done on the second level
We can structure function generation process in Metalibm-
and the produced code has to be vectorizable. To make the
open in three levels.
reconstruction vectorizable, a solution is to exhibit some map-
a) First level (polynomial approximation): Relatively ping function m(x), that returns an index of the corresponding
“simple” function flavors are treated on the first level. The interval [28].
generator starts with checking if the maximum degree dmax
is sufficient to approximate the function f by a polynomial
with an error bounded by ε̄. If so, C code and its Gappa proof D. Several examples of code generation
are generated. Otherwise, the second or third levels are used. The two first examples correspond to f = exp with the
b) Second level (piece-wise polynomial approximation): target accuracy ε̄ = 2−53 , polynomial degree is bounded by
This level uses domain splitting and piecewise polynomial dmax = 9. Then we test a toy composite function and a
approximation. More details can be found in [22]. sigmoid function as used in neural networks.
c) Third level (exploiting algebraic properties): On the 1) For f = exp on the small domain I = [0, 0.3], the
third level of function flavors generation we try to detect math- first level is enough. The generated code only consists of
ematical properties and to use them for argument reduction. polynomial coefficients and polynomial evaluation function.
The properties currently detected include Function generator does not handle special cases for the mo-
ment (infinities, NaNs, overflows, etc.), so for larger domains
f (x + y) = f (x)f (y) this filtering has to be added manually, this is addressed in
f (x + C) = f (x) Section IV. This function flavor is about 1.5 times faster than
the standard exp function from the glibc libm.
f (x) + f (y) = f (xy)
2) For exponential flavors with larger domains table-driven
f (x) = f (−x) algorithm for argument reduction mentioned in Section II is
f (x) = −f (x) performed [23]. We enlarge the domain from the previous
example to I = [0, 5] and set w = 4 for table (the table size
They respectively correspond to exponential functions bx+y = is 2w ). The generation is performed on the third level, the
bx by , periodic functions, e.g. sin(x + 2πk) = sin(x), loga- family of exponential functions is detected and the domain
rithms logb (x) + logb (y) = logb (xy) and symmetrical func- is reduced to [− log(2)/2w+1 , log(2)/2w+1 ]. Then a recursive
tions (both even and odd). When such a property is detected, call to the first level of code generation is performed. The
the tool uses the corresponding argument reduction. Then it generated code for this flavor is executed in a range between 10
needs an approximation scheme in the reduced domain. For and 60 machine cycles, while the upper bound for libm code
this it performs a recursive call to the first or second level. is 80 machine cycles. However, the generated code computes
As the tool only accesses the function as a blackbox, it is most inputs in less than 25 cycles, while libm’s exponential
unable to prove that the property is actually true. However, all is executed in a range between 15 and 35 cycles.
it needs is to ensure that it is true up to some accuracy (the 3) A more interesting example is f (x) = sin(cos(x/3) − 1)
accuracy needed for the range reduction to work). This can be on the domain I = [−1, 1] with 48 bits of accuracy. The
done purely numerically. standard libm does not support composite functions, there
As an example, let us show how to detect the property f (x+ will be two function calls (both with special cases handling)
y) = f (x)f (y), which corresponds to a family of exponential and a division performed. In the generated code the composite
functions bx+y = bx by for some base b ∈ R. function is approximated by polynomials. Even though the tar-
First, two different points x and y are chosen in I, and get error is less than the mantissa length for double precision,
the tool checks if there exists |ε| < ε̄ such that f (x + y) = the generated code wins both in accuracy and performance.
f (x)f (y)(1 + ε). If not, the property
is not
true. If yes, the Due to cancellation at zero, the error for libm’s composite
tool deduces a candidate b = exp ln(fx(x)) for some random function explodes (Figure 3) while the error of generated
5
function stays bounded (Figure 2). Execution of the generated A. General overview
code takes between 8 and 470 machine cycles, while execution
of the libm’s analog of this flavor takes more than 150 cycles All the framework is implemented in Python, a language
and may reach 650 cycles in the worst case. chosen to ensure that the framework is fully scriptable. More
4) Another example is thesigmoid function f (x) = 1+e1−x importantly, an evaluation scheme is itself described in Python.
on the domain I = [−2; 2] with 52 correct bits. No algebraic Technically, one first defines a Python variable as being the
property is detected; so the generation is done on the second output of the generated code, then all computations that
level. The generated code and the libm’s code are both eventually affect this variable are considered as belonging to
of comparable accuracy and performance: execution takes the evaluation scheme.
between 25 and 250 cycles with most cases done within 50 A similar mechanism enables the designer to embed, in
cycles. The polynomial degree for the generation is bounded the same Python code, the analysis and proof of numerical
by dmax = 9, the domain was split to 22 subintervals. properties of the evaluation scheme. First, intervals may be
described and manipulated directly in Python using interval
arithmetic. This is useful to script range and error analysis. The
E. Open-ended code generation: wrap up and outlook fact that interval computations are scripted in Python is a non-
As demonstrated with the examples shown in the last automatic, but practical way out of the correlation issues that
Section and with its own 10300 lines of code, open-ended plague naive interval arithmetic. Second, pure mathematical
code generation has reached a certain level of maturity. It is expression may also be described, still in the same classical
able to quickly generate code for functions defined by various Python syntax. They can be used for describing what the code
means, including black-box definitions. The produced codes is supposed to compute, thus enable delegation to Sollya of
do not yet reach the performance manual implementation the computation of εapprox , and to Gappa the accumulation of
would eventually reach but they are available now, at reduced rounding errors [16].
cost. Final accuracy is bounded by construction and in the case This may seem a lot of overloading for the Python syntax.
of composite functions better than for plain libm function Below all this, we still also keep standard Python to script the
composition. code generation, in several steps.
6
the first periods, but avoid over-testing the very small numbers handling of (out-of-order) processor architectures where the
(which represent almost half the floating-point numbers) where processor behavior is too hard to model anyway.
sin x ≈ x. Etc. Finally, even though the codes produced by our two ap-
The proposed framework therefore defines a class for a C11 proaches are already hardened through the use of Gappa as a
function with its test infrastructure. Instances of this class (one formal proof tool, integration of code generation into certified
per function) may overload some methods of this class, such FP code development suites such as Why, Frama-C or even
as the generator of regression tests (the defaults tests numbers model checkers might become of some interest.
such as the signed zeroes, infinities and NaN, and extreme
threshold values of the subnormal and normal domains) and R EFERENCES
one or few generators of random input. [1] J.-M. Muller, Elementary Functions: Algorithms and Implementation.
Secaucus, NJ, USA: Birkhauser Boston, Inc., 1997.
[2] M. Abramowitz and I. A. Stegun, Handbook of mathematical functions,
ser. Applied Math. Series 55. National Bureau of Standards, Washing-
E. Wrapping up: current state of the project ton, D.C., 1964.
The backend code generator (processor classes, generic [3] “IEEE standard for floating-point arithmetic,” IEEE 754-2008, also
ISO/IEC/IEEE 60559:2011, Aug. 2008.
optimization steps and proof generators) currently consists of [4] N. Kapre and A. DeHon, “Accelerating spice model-evaluation using
3800 lines of Python code (counted by the CLOC utility) for FPGAs,” Field-Programmable Custom Computing Machines, pp. 37–44,
the open-source part covering the generic and 86-optimized 2009.
[5] V. Innocente, “Floating point in experimental HEP data processing,” in
processor classes. In addition there are also a few proprietary 2nd CERN Openlab/INTEL Workshop on Numerical Computing, 2012.
files related to the Kalray processor. [6] P. Danilo, “The VDT mathematical library,” in 2nd CERN Openlab/IN-
The C11 function generators currently amount to 2482 lines TEL Workshop on Numerical Computing, 2012.
[7] J. Apostolakis, A. Buckley, A. Dotti, and Z. Marshall, “Final report
of code. This includes exp, log, log1p, log2, and sin/cos at of the ATLAS detector simulation performance assessment group,”
various degrees of completion (they generate working code, CERN/PH/SFT, CERN-LCGAPP-2010-01, 2010. [Online]. Available:
but not always of performance comparable to handwritten http://sftweb.cern.ch/AAdocuments
[8] G. Paul and M. W. Wilson, “Should the elementary functions be
code). There are also inverse, division, inverse square root, incorporated into computer instruction sets?” ACM Transactions on
and square root implementations, essentially intended for Mathematical Software, vol. 2, no. 2, pp. 132–142, 1976.
processors without a hardware divider, such as the IA64 or [9] P. T. P. Tang, “Table-driven implementation of the exponential function
in IEEE floating-point arithmetic,” ACM Transactions on Mathematical
Kalray. Inverse approximation is actually of more general use Software, vol. 15, no. 2, pp. 144–157, 1989.
and can replace standard division in situations where it is [10] S. Gal and B. Bachelis, “An accurate elementary mathematical library for
needed with accuracy lower or higher than the floating-point the IEEE floating point standard,” ACM Transactions on Mathematical
Software, vol. 17, no. 1, pp. 26–45, 1991.
precision. A typical function generator typically consists of [11] P. Markstein, IA-64 and Elementary Functions: Speed and Precision,
200-300 lines of code. Considering that the generated code is ser. Hewlett-Packard Professional Books. Prentice Hall, 2000.
typically 100-200 lines, the Gappa proof about the same size, [12] M. Cornea, J. Harrison, and P. T. P. Tang, Scientific Computing on
Itanium R -based Systems. Intel Press, 2002.
and that one generator produces many code flavors, the code [13] P. Markstein, “Accelerating sine and cosine evaluation with compiler
generator approach should be much easier to maintain. assistance,” in 16th Symposium on Computer Arithmetic, . P. t. E. C.
o.-C. B. Jean and M. Schulte, Eds. IEEE Computer Society, 2003, pp.
Near-term future work include adding ARM processors and 137–140.
more functions. Medium-term future work include an OpenCL [14] R. M. S. e. a. S. Loosemore, The GNU C Library Reference Manual.
backend to target GPUs (increasingly used as floating-point ac- Free Software Foundation, Inc.
[15] A. Ziv, “Fast evaluation of elementary mathematical functions with cor-
celerators), more evaluation scheme optimization using CGPE rectly rounded last bit,” ACM Transactions on Mathematical Software,
[29], and the generation of correctly-rounded function flavors. vol. 17, no. 3, pp. 410–423, 1991.
[16] F. de Dinechin, C. Q. Lauter, and G. Melquiond, “Certifying the
floating-point implementation of an elementary function using gappa,”
V. C ONCLUSIONS AND F UTURE WORK IEEE Trans. Computers, vol. 60, no. 2, pp. 242–253, 2011. [Online].
Available: http://doi.ieeecomputersociety.org/10.1109/TC.2010.128
This article discusses two approaches to addressing the [17] R. C. Whaley and A. Petitet, “Minimizing development and maintenance
challenges faced by libm developers. The first is automated costs in supporting persistently optimized BLAS,” Software: Practice
and Experience, vol. 35, no. 2, pp. 101–121, February 2005,
generation of evaluation schemes, to address the large number http://www.cs.utsa.edu/˜whaley/papers/spercw04.ps.
of functions of interest. The second is a specific development [18] M. Frigo and S. G. Johnson, “The design and implementation of
framework, to address the large number of hardware targets FFTW3,” Proceedings of the IEEE, vol. 93, no. 2, pp. 216–231, 2005,
special issue on “Program Generation, Optimization, and Platform
of interest. These two approaches were developed indepen- Adaptation”.
dently, and current work focuses on integrating them more. [19] M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer,
A good case study for working on such integration is the J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W.
Johnson, and N. Rizzolo, “SPIRAL: Code generation for DSP trans-
correct rounding of elementary functions. It presents several forms,” Proceedings of the IEEE, special issue on “Program Generation,
challenges, such as function evaluation in larger-than-standard Optimization, and Adaptation”, vol. 93, no. 2, pp. 232– 275, 2005.
precisions or less common formats, or vectorization of a [20] N. Brisebarre and S. Chevillard, “Efficient polynomial l-
approximations,” in 18th IEEE Symposium on Computer Arithmetic
technique centered on a run-time test [15]. (ARITH-18 2007), 25-27 June 2007, Montpellier, France, 2007,
In addition, both approaches, as well as the final integrated pp. 169–176. [Online]. Available: http://doi.ieeecomputersociety.org/10.
code generator should also integrate notions of code tuning 1109/ARITH.2007.17
[21] D.-U. Lee, P. Cheung, W. Luk, and J. Villasenor, “Hierarchical segmen-
through code timing, as in software packages such as ATLAS. tation schemes for function evaluation,” IEEE Transactions on VLSI
This allows for simplification of modeling of processors or Systems, vol. 17, no. 1, 2009.
9