0% found this document useful (0 votes)

13 views11 pages

Code Generators For Mathematical Functions

The article discusses the limitations of standard mathematical libraries (libm) in providing a limited set of mathematical functions and the need for a more flexible code generation approach. It proposes an on-demand generation of numerical function code that can accommodate a wider range of functions and accuracy requirements, thus improving productivity and performance. The authors present a framework for generating high-quality mathematical function code tailored to specific contexts, while also addressing the challenges of automating libm development.

Uploaded by

sandeep.aratla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

Code Generators For Mathematical Functions

Uploaded by

sandeep.aratla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/281948277

Code generators for mathematical functions

Article · November 2014

CITATIONS READS

17 609

4 authors, including:

Nicolas Brunie Florent De Dinechin

Kalray Institut National des Sciences Appliquées de Lyon
32 PUBLICATIONS 378 CITATIONS 136 PUBLICATIONS 2,899 CITATIONS

SEE PROFILE SEE PROFILE

Olga Kupriianova
Ecole normale supérieure de Lyon
7 PUBLICATIONS 104 CITATIONS

SEE PROFILE

All content following this page was uploaded by Olga Kupriianova on 05 November 2015.

The user has requested enhancement of the downloaded file.

Code generators for mathematical functions
Nicolas Brunie, Florent De Dinechin, Olga Kupriianova, Christoph Lauter

To cite this version:

Nicolas Brunie, Florent De Dinechin, Olga Kupriianova, Christoph Lauter. Code generators
for mathematical functions. 2014. <hal-01084726v2>

HAL Id: hal-01084726

http://hal.upmc.fr/hal-01084726v2
Submitted on 20 Nov 2014

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
1

Code generators for mathematical functions

Nicolas Brunie, Florent de Dinechin, Olga Kupriianova, Christoph Lauter

Abstract—A typical floating-point environment includes sup- most of its time in the evaluation of elementary functions [4].
port for a small set of about 30 mathematical functions such The same holds for large-scale simulation and analysis code
as exponential, logarithms and trigonometric functions. These run at CERN [5]–[7].
functions are provided by mathematical software libraries (libm),
typically in IEEE754 single, double and quad precision. The problem is that the optimal implementation of each
This article suggests to replace this libm paradigm by a more function is very dependent on the technology. Mathemati-
general approach: the on-demand generation of numerical func- cal support is provided by a combination of hardware and
tion code, on arbitrary domains and with arbitrary accuracies. software, and the optimal repartition between hardware and
First, such code generation opens up the libm function space software has evolved with time [8]. For instance, the first
available to programmers. It may capture a much wider set
of functions, and may capture even standard functions on non- 80287 mathematical coprocessor, in 1985, included support
standard domains and accuracy/performance points. for a range of elementary functions (albeit in microcode)
Second, writing libm code requires fine-tuned instruction selec- in addition to the basic operations. It was later found that
tion and scheduling for performance, and sophisticated floating- software could outperform such microcode. For instance, as
point techniques for accuracy. Automating this task through code memory got cheaper, large tables of precomputed values [9],
generation improves confidence in the code while enabling better
design space exploration, and therefore better time to market, [10] could be used to speed-up the computation of a func-
even for the libm functions. tion. Furthermore, progress in compiler technology allowed
This article discusses, with examples, the new challenges of for a deeper integration of elementary functions software
this paradigm shift, and presents the current state of open-source in the overall compilation process [11]–[13], another case
function code generators. for software-based functions. Today, even the relevance of
Index Terms—elementary functions, code generator, range hardware division and square root is disputed. However, at
reduction, reconstruction, interpolation. the same time, the table-based algorithms of the 90s are being
replaced with high-degree polynomials [12] that behave better
I. I NTRODUCTION & M OTIVATION in the context of current highly parallel, memory-constrained
multicores.
A. Standard Mathematical Libraries
The standard mathematical library (libm) provides, in a C. Limits of the library approach
small set of precisions (simple and double precision), a small
1) Productivity of libm development: Writing a libm re-
set of mathematical functions:
quires fine-tuned instruction selection and scheduling for
• elementary functions [1] such as exponential and loga-
performance, and sophisticated floating-point techniques for
rithm, sine, cosine, tangent and their inverses, hyperbolic accuracy. Re-optimization of these mutually dependent goals
functions and their inverses; for each new processor is a time-consuming and error-prone
• and other “special” functions useful in various computing
task. Besides, it is desirable that each function comes in several
contexts, such as the power function xy , erf, Γ or the Airy variants corresponding to a range of constraints on perfor-
function [2]. mance (e.g. optimized for throughput or optimized for latency)
Strictly speaking, the libm is specified in the C language or accuracy. Some processor and system manufacturers (Intel,
standard (currently ISO/IEC 9899:2011). Among the other AMD, ARM, NVIDIA, HP, Apple) therefore employ large
languages, some (C++, Fortran, Python) use the same library, teams of engineers dedicated to libm maintenance.
and some redefine it in a more or less compatible way. The With more limited manpower, the open-source mathematical
2008 revision of the IEEE 754 floating-point standard [3] libraries (most notably in the GNU glibc [14] and in
has attempted a common standardization. In addition, lan- Newlib1 ) lag behind in performance, all the more as they
guage standards also specify elementary functions on complex have to support a much wider range of processors. However,
numbers, but for historical reasons not in the “mathematical the most accurate implementations are found in open-source
library” section. Also, many operating systems or libraries efforts, with several correctly rounded functions provided by
offer, for convenience, more functions than strictly required libraries such as IBM LibUltim [15] (now in the glibc) and
by the libm. In this work we understand the term “libm” in more recently CRLibm [16].
its widest sense and address all these functions. 2) Limited choice: Another problem is that the libm
functions do not match all the needs of users.
B. Performance of a libm First, the choice is limited. For instance, the web
page http://project-mathlibs.web.cern.ch/project-mathlibs/
Performance of libm functions is of critical importance, in mathTable.html mentions all the functions that are used by
particular in scientific and financial computing. For example,
profiling the SPICE electronic simulator shows that it spends 1 https://sourceware.org/newlib/
2

open-ended code generation

black-box approximation scheme

function generator
specification sollya back-end C code
function
generator code
I/O precision
python
target accuracy C11 function
code generators test
range
user input python code
processor
C11 libm code generation

Figure 1: The proposed tool suite.

CERN simulations. Only a fraction of this list is offered by success is that the generated code is better than whatever other
the standard libm. The rest is programmed when needed, approach the programmer would have to use (composition
sometimes by programmers who lack the expertise of libm of libm function, numerical integration if the function is
developers. This leads to duplicated efforts and, often, poorer defined by an integral, etc). “Better” may mean faster, or more
quality (performance- or accuracy-wise) than libm function. accurate, or better behaved in corner cases, etc.
Second, even the functions in the libm are often too good Ideally, we wish we could have a generator where we have a
(and therefore slower than they could be) for an application. clear separation (as on Figure 1) between a front-end building
In recent standards, range is specified as “as far as the input an approximation scheme, and a back-end implementing it on
and output formats allow”, while accuracy is specified as a given target technology. A first challenge is that the front-
“as accurate as the output format allows”. This is the best end itself must be directed by the target context. This is easy
possible specification for a generic library function, but is to understand with table-based range reduction techniques,
overkill for many applications. If a programmer knows that the for instance: their relevance and efficiency greatly depents on
input to a cosine will remain within a period, and that about the context. Formally capturing this complexity is currently
three decimal digits of accuracy will be enough considering out of reach. Fortunately, it is always possible to perform an
the accuracy of the inputs, the libm cosf function is too empirical exploration of the parameter space: generate several
accurate (7 decimal digits) on too large a range. variants, compile and time then, and pick up the best.
The present article claims that the solution to both previous We believe that the techniques presented in Section III can
issues (productivity, and limited choice) is to automate libm eventually be extended to capture all the functions of C11 on
development so that functions can be generated on-demand for their whole range. However, it is currently not the case. There
a wider set of contexts. This solution has already proven effec- is a lot of human expertise in libm development that we
tive for other classes of mathematical software, most notably are not yet able to automate. In particular, bivariate functions
the ATLAS [17] project for linear algebra, and the FFTW [18] such as atan2 or pow, as well as some special functions,
and SPIRAL [19] projects for fast Fourier transforms. are currently out of reach. The second use case therefore
focuses on assisting people who have this expertise, not yet
D. Use cases for generators of mathematical functions on replacing them. It targets a much narrower audience of
We are currently focusing on two main use cases where we programmers, those in charge of providing the actual libm
need a libm generator, both illustrated by Figure 1. functionality to an operating system or compiler. Here the
The first one targets the widest audience of programmers. criterion of success is that the generated code is of comparable
It is a push-button approach that will try to generate code quality to hand-written code, but obtained much faster.
on a given domain and for a given precision for an arbitrary The chosen approach, reviewed in Section IV, consists in
univariate function with continuous derivatives up to some giving to the libm developer the keys to the back-end. We
order. The function may be provided as a mathematical thus aim at offering him/her a development framework in
expression, or even as an external library that is used as a which he/she may describe the relevant evaluation schemes.
black box. We call this approach the open-ended approach, in The challenge here is to find a proper balance between
the sense that the function that can be input to the generator is two incompatible goals: 1/ This framework should be able
arbitrary – which does not mean that the generator will always to capture every trick that libm developers use, otherwise
succeed in handling it. Section III will describe how this they will not adopt it. 2/ It should nevertheless raise the
generator has evolved from simple polynomial approximations level of abstraction, so that a single description generates
to the generation of more sophisticated evaluations schemes, code for the variety of code flavors and targets we want to
including attempts to range reduction. Here, the criterion of address. And it should be fully scriptable to enable design-
3

space exploration, but always under full control of the libm to be implemented. In both cases the approximation problem
developer. Section IV reviews a technical solution that matches is reduced to approximating a (possibly different) function fr
these goals. on a smaller domain.
This second use case can be viewed as a pragmatic, bottom- 1) Domain splitting: The main idea here is to partition
up approach, where we embed existing hand-crafted code in I into subdomains, so that on each of them we can find
a framework to make it more generic. The first, open-ended a minimax polynomial approximation of small degree. The
use case is more ambitious, more high-level, and top-down splitting can be uniform, logarithmic [21], or arbitrary [22].
from the most abstract mathematical description of a function. In the two first cases the polynomial is selected based on a
These two approaches do meet as illustrated on Figure 1, and few bits from the input x. In the last case, several if-else
we do not claim that there is a clear border between them. For statements are needed.
instance, the libm developer working directly with the back- 2) Function-specific argument reduction: For some ele-
end will nevertheless invoke the evaluation scheme generator, mentary functions, specific algorithms of argument reduction
to solve sub-problems, typically after range reduction. may be applied [1], [23]–[25]. They are based on mathematical
properties such as bx ·by = bx+y . For example, the exponential
II. BACKGROUND IN E LEMENTARY F UNCTION function can be computed as follows [23]
I MPLEMENTATION
ex = 2E · 2k−E · 2x log2 e−k ,
In order to understand the challenge of writing a code
generator for mathematical functions, it is useful to know with a parameter w ∈ N, k = b2w x log2 ec2−w and E = bkc.
how they are implemented manually. To evaluate a function Thus, E ∈ Z, k ∈ 2−w Z.
at some point, we may use a finite number of additions, mul- Finally, taking a small r = x log2 e − k, we get
tiplications, floating-point number manipulations,comparisons
ex = 2E 2k−E er ,
and precomputed tables. A generic technique along these rules
is to approximate a function by tables and polynomials or where the values of 2k−E are stored in a table of size 2w .
piecewise polynomials. After filtering out special values like Finally, the task is reduced to an approximation of er , where
NaNs and infinities [3], there are usually three steps: argument r takes values from a small range. This approximation can be
reduction, polynomial approximation and reconstruction [1]. computed with one polynomial or with an additional splitting
The following details these steps. if needed.

A. Polynomial approximation C. Existing libm development tools

There are many ways to compute an approximation poly- The two use cases mentioned in Section I-D generate code
nomial for a given function. The minimax polynomial p for a with guaranteed accuracy. For this, they rely on two external
function f on a given interval I minimizes the approximation tools that have been used for manual libm development.
error Sollya [26] is a numerical toolbox for libm developers
I
εappr = kf − pk∞ = max |f − p| with a focus on safe floating-point computations. In particular
x∈I

among all the polynomials of a given degree d. Remez it provides state-of-the-art polynomial approximation [20] and
I
algorithm converges to a minimax polynomial. However, this safe algorithms to compute εapprox = kf − pk∞ [27]. It also
algorithm produces real coefficients, which may not be rep- provides a scripting language.
resentable with finite-precision numbers. Rounding these co- Gappa [16] is a formal proof assistant that is able to manage
efficients to floating-point numbers entails a loss of accuracy. the accumulation of floating-point errors in most of libm
This problem is solved in [20] by a modified Remez algorithm codes. Compared to [16], in the present work the Gappa proof
that finds a minimax-like polynomial among polynomials with scripts are not written by hand, but generated along with the
floating-point coefficients. C code. Interestingly, Gappa is itself a generator (it generates
These algorithms input f , I and the degree d, and return Coq or HOL formal proofs).
p and εapprox . For a given f and I, the larger the degree
d, the better the approximation (the smaller εappr ). When III. A PPROXIMATION TECHNIQUES FOR BLACKBOX
implementing a libm-like mathematical function, we have FUNCTIONS
an upper bound constraint on εappr , and we look for the A. Overview
smallest d that satisfies this bound. This approximation error -
This section describes an open-source code generator writ-
driven decision of finding d is slightly harder to compute then
ten in Sollya. It inputs a parametrization file and produces
classical polynomial approximation.
corresponding C code. The parametrization file includes the
function f , its implementation domain I, the desired accuracy
B. Argument reduction ε̄, maximum degree dmax for approximation polynomials, etc.
When it is not possible to compute a polynomial of a small The function may be specified as a mathematical expression,
enough degree, the implementation domain I may be reduced. or an external library. An important feature of this approach
There are two basic techniques for this: split the domain into is that the tool does not need an explicit formula for the
smaller subdomains, or use specific properties of the function specified function. It only accesses the function as a numerical
4

blackbox. All the tool requires from this blackbox is to be able x ∈ I. Only in this case does it check that the property is true
to return an arbitrarily accurate numerical evaluation of the to the required accuracy, by computing
function and its first few derivatives over interval. This way, I
bx
the tool will work for functions described not only as closed- ε̃ = −1
form expressions, but also as integrals, inverse functions, etc. f (x) ∞
It should however be reasonably fast to evaluate. and checking if ε̃ ≤ ε̄.
This blackbox encapsulation of the function will not pre- Other properties can be detected in the same way. However,
vent the tool to exploit the function-specific range reductions handling the error bound ε̃ requires some analysis for certain
presented in Section II-B. The following will show that the function properties.
required mathematical properties can be detected and exploited
from a blackbox function. C. Reconstruction
The tool also supports higher-than-double accuracy, using
The final step in mathematical functions implementation
double-double and triple-double arithmetic.
is reconstruction, i.e. the sequence of operators (or just a
formula) needed to be executed to evaluate the function at
B. Different levels of code generation some point x from the initial domain I. The reconstruction
may be tricky if the generation was done on the second level
We can structure function generation process in Metalibm-
and the produced code has to be vectorizable. To make the
open in three levels.
reconstruction vectorizable, a solution is to exhibit some map-
a) First level (polynomial approximation): Relatively ping function m(x), that returns an index of the corresponding
“simple” function flavors are treated on the first level. The interval [28].
generator starts with checking if the maximum degree dmax
is sufficient to approximate the function f by a polynomial
with an error bounded by ε̄. If so, C code and its Gappa proof D. Several examples of code generation
are generated. Otherwise, the second or third levels are used. The two first examples correspond to f = exp with the
b) Second level (piece-wise polynomial approximation): target accuracy ε̄ = 2−53 , polynomial degree is bounded by
This level uses domain splitting and piecewise polynomial dmax = 9. Then we test a toy composite function and a
approximation. More details can be found in [22]. sigmoid function as used in neural networks.
c) Third level (exploiting algebraic properties): On the 1) For f = exp on the small domain I = [0, 0.3], the
third level of function flavors generation we try to detect math- first level is enough. The generated code only consists of
ematical properties and to use them for argument reduction. polynomial coefficients and polynomial evaluation function.
The properties currently detected include Function generator does not handle special cases for the mo-
ment (infinities, NaNs, overflows, etc.), so for larger domains
f (x + y) = f (x)f (y) this filtering has to be added manually, this is addressed in
f (x + C) = f (x) Section IV. This function flavor is about 1.5 times faster than
the standard exp function from the glibc libm.
f (x) + f (y) = f (xy)
2) For exponential flavors with larger domains table-driven
f (x) = f (−x) algorithm for argument reduction mentioned in Section II is
f (x) = −f (x) performed [23]. We enlarge the domain from the previous
example to I = [0, 5] and set w = 4 for table (the table size
They respectively correspond to exponential functions bx+y = is 2w ). The generation is performed on the third level, the
bx by , periodic functions, e.g. sin(x + 2πk) = sin(x), loga- family of exponential functions is detected and the domain
rithms logb (x) + logb (y) = logb (xy) and symmetrical func- is reduced to [− log(2)/2w+1 , log(2)/2w+1 ]. Then a recursive
tions (both even and odd). When such a property is detected, call to the first level of code generation is performed. The
the tool uses the corresponding argument reduction. Then it generated code for this flavor is executed in a range between 10
needs an approximation scheme in the reduced domain. For and 60 machine cycles, while the upper bound for libm code
this it performs a recursive call to the first or second level. is 80 machine cycles. However, the generated code computes
As the tool only accesses the function as a blackbox, it is most inputs in less than 25 cycles, while libm’s exponential
unable to prove that the property is actually true. However, all is executed in a range between 15 and 35 cycles.
it needs is to ensure that it is true up to some accuracy (the 3) A more interesting example is f (x) = sin(cos(x/3) − 1)
accuracy needed for the range reduction to work). This can be on the domain I = [−1, 1] with 48 bits of accuracy. The
done purely numerically. standard libm does not support composite functions, there
As an example, let us show how to detect the property f (x+ will be two function calls (both with special cases handling)
y) = f (x)f (y), which corresponds to a family of exponential and a division performed. In the generated code the composite
functions bx+y = bx by for some base b ∈ R. function is approximated by polynomials. Even though the tar-
First, two different points x and y are chosen in I, and get error is less than the mantissa length for double precision,
the tool checks if there exists |ε| < ε̄ such that f (x + y) = the generated code wins both in accuracy and performance.
f (x)f (y)(1 + ε). If not, the property
is not
true. If yes, the Due to cancellation at zero, the error for libm’s composite
tool deduces a candidate b = exp ln(fx(x)) for some random function explodes (Figure 3) while the error of generated
5

1e-15 However, open-ended code generation is currently limited

in some ways. We wish to remove these limitations. First of
1e-16 all, unless an argument reduction can be detected, which is not
always possible as the function might not suit one, open-ended
1e-17
code generation currently is reduced to small domains. Domain
splitting can offset this limitation only up to some point.
1e-18
For instance, splitting domains like [1; ∞] or just [1; 21024 ]
1e-19
is hopeless. A reasonable way to address this limitation is to
transform f over a large domain I = [a; b], a ≥ 1 into f (1/x)
1e-20 over [1/b; 1/a]. Performing black-box static error analysis for
this case is hard, though.
1e-21 In addition, open-ended code generation is currently limited
-9e-07 -8e-07 -7e-07 -6e-07 -5e-07 -4e-07 -3e-07 -2e-07 -1e-07 0
to certain IEEE754 precisions for legacy reasons and has
Figure 2: Logarithmic relative error of generated code on trouble exploiting other framework’s back-ends as a remplace-
domain [−2−20 ; −2−30 ] for sin(cos(x/3) − 1) . ment of its own. This question however mainly relates to re-
engineering the software tools behind the existing open-ended
1
code generation, for instance by modularizing it more.

0.1 IV. A CODE GENERATION FRAMEWORK ASSISTING L I B M

DEVELOPERS
0.01
We now turn to libm developers, and describe a develop-
0.001
ment framework that will enable them to address the produc-
0.0001
tivity issue. This framework could be used to implement the
previous techniques and manage the back-end code generation.
1e-05 However, we mostly present it from the point of view of a
1e-06
developer of classical libm. As already mentionned, there
is no clear boundary between the two approaches depicted
1e-07 on Figure 1.
-9e-07 -8e-07 -7e-07 -6e-07 -5e-07 -4e-07 -3e-07 -2e-07 -1e-07 0
Due to space restrictions, we essentially present and mo-
Figure 3: Logarithmic relative error of the libm call on tivate various technical choices. A reader interested in more
domain [−2−20 ; −2−30 ] for sin(cos(x/3.0) - 1.0) . details is invited to look at (and experiment with) the open-
source code.

function stays bounded (Figure 2). Execution of the generated A. General overview
code takes between 8 and 470 machine cycles, while execution
of the libm’s analog of this flavor takes more than 150 cycles All the framework is implemented in Python, a language
and may reach 650 cycles in the worst case. chosen to ensure that the framework is fully scriptable. More
4) Another example is thesigmoid function f (x) = 1+e1−x importantly, an evaluation scheme is itself described in Python.
on the domain I = [−2; 2] with 52 correct bits. No algebraic Technically, one first defines a Python variable as being the
property is detected; so the generation is done on the second output of the generated code, then all computations that
level. The generated code and the libm’s code are both eventually affect this variable are considered as belonging to
of comparable accuracy and performance: execution takes the evaluation scheme.
between 25 and 250 cycles with most cases done within 50 A similar mechanism enables the designer to embed, in
cycles. The polynomial degree for the generation is bounded the same Python code, the analysis and proof of numerical
by dmax = 9, the domain was split to 22 subintervals. properties of the evaluation scheme. First, intervals may be
described and manipulated directly in Python using interval
arithmetic. This is useful to script range and error analysis. The
E. Open-ended code generation: wrap up and outlook fact that interval computations are scripted in Python is a non-
As demonstrated with the examples shown in the last automatic, but practical way out of the correlation issues that
Section and with its own 10300 lines of code, open-ended plague naive interval arithmetic. Second, pure mathematical
code generation has reached a certain level of maturity. It is expression may also be described, still in the same classical
able to quickly generate code for functions defined by various Python syntax. They can be used for describing what the code
means, including black-box definitions. The produced codes is supposed to compute, thus enable delegation to Sollya of
do not yet reach the performance manual implementation the computation of εapprox , and to Gappa the accumulation of
would eventually reach but they are available now, at reduced rounding errors [16].
cost. Final accuracy is bounded by construction and in the case This may seem a lot of overloading for the Python syntax.
of composite functions better than for plain libm function Below all this, we still also keep standard Python to script the
composition. code generation, in several steps.
6

Python generic code:

Processor k = NearestInteger(unround_k, precision = self.precision)
exp.py
log.py Kalray x87 ARM NearestInteger is a method of the generic Processor class, and generates the following
code for binary32:
K1a K1b SSE2 k = rintf(unround_k);
When overloaded in the Kalray processor class, it generates the following code:
AVX2 t = __builtin_k1_fixed(_K1_FPU_NEAREST_EVEN, unround_k, 0);
k = __builtin_k1_float(_K1_FPU_NEAREST_EVEN, t, 0);
polynomial approximation The same Python also generates binary64 versions, here for for x86 with SSE2:
evaluation parallelization t = _mm_set_sd(unround_k);
t1 = _mm_round_sd(t, t, _MM_FROUND_TO_NEAREST_INT);
Gappa generation k = _mm_cvtsd_f64(t1);
fast path factorization
Figure 5: Examples of processor-specific code generation
instructions selection
...
C code generation B. The processor class hierarchy
Figure 4: Zoom on the backend C code generator Modern processors, even within ISA families such as IA32
or ARM, exhibit a lot of variety in their instruction sets.
They differ in the basic arithmetic support (presence or not of
First, the execution of the Python code describing the
a fused multiply-and-add, of a divide instruction, of various
evaluation scheme builds a corresponding abstract syntax
int/float conversion instructions). They also differ in the model
tree (AST). Variables and operations in the AST may be
of parallelism and in the capabilities of the hardware to
annotated with all sorts of information, either explicitly by the
extract this parallelism (some are VLIW, some are superscalar,
programmer, or automatically by the framework. Example of
some support vector parallelism; all are pipelined but the
annotations include: desired precision, intervals for ranges and
depth and width of the pipeline vary greatly). Finally, they
errors, probability of taking a branch, exactness of operations
are all increasingly memory-starved, and may offer different
(no rounding errors), dependency to floating-point environ-
strategies to address this issue (caches, memory prefetching,
ment (rounding mode, exception visibility).
memory access coalescing, etc). Generating optimized elemen-
More generally, the AST may be manipulated from within
tary function code for such a variety of processors is very
the same Python code. The framework provides many opti-
challenging. Actually, we want to optimize the code not for
mization steps for this. Some of them are similar to those
a processor, but for a context that includes the processor.
implemented in generic compilers (for instance if-conversion
For instance, for the same processor, throughput-oriented or
or instruction selection). Some others are specific to the
latency-oriented code may be very different.
libm context (for instance labelling the tree with range and
error intervals, or reparenthesing of arithmetic expressions to For this, we define for each processor a class that provides
express parallelism, similarly to the CGPE approach [29]). It information to optimization steps, and also provides code
is also possible to write function-specific optimization steps. generation services. As recent processors tend to inherit the
An important feature is that these optimizations steps are instruction set of their ancestors, the inheritance mechanism
invoked from the same Python code where the evaluation of object-oriented languages works well for overriding old
scheme is described. The libm designer may therefore chose techniques with newer ones when new instructions appear.
to apply them or not (depending on the context), and decide The current prototype generates core for four targets from
on which part of the AST an optimization step is applied. This two very different families. The first is a recent x86 pro-
provides much finer control than the optimization switches of a cessor, an out-of-order superscalar architecture. We consider
compiler, while remaining easy to manage because elementary an SSE2-enabled version (using the modern SIMD FPU that
function codes are small. It is therefore a practical way out of complements the legacy x87 unit), and a more recent AVX2-
the issue of conflicting compiler optimizations. enabled one (adding hardware support for the FMA). The
Among these optimization steps, instruction selection trans- second family is the K1 core, developped by Kalray for its
forms the AST into a form closer to the generated code. MPPA manycore processor. It implements a 5-issue VLIW in-
It is dependent on the target hardware, as depicted in next order architecture, and comes in two versions: The first, K1A,
section. The generated C is often very close to assembly offers a mixed single precision / double precision floating point
code: Each line of C roughly matches one machine instruction, unit. The second version, K1B, adds binary64 FMA capability
sometimes explicitly thanks to intrinsics if the target decided and two-way SIMD capabilities for binary32.
so. However we still leave a lot to the compiler: the detailed There is also a default processor target where no asumption
scheduling of the instructions, the register allocation, and the is made on the hardware support, except IEEE-754 compli-
auto-vectorization. ance. Support for the ARM family will be added in the near
It is in general a challenge to avoid reinventing the compiler. future.
We want to rely on existing compilers for what they are good Figure 5 shows one example of code generation service
at. As compilers improve, some of the technical choices made provided by the processor class: the rounding of a floating-
here will be reevaluated. point number to an int.
7

processor function speedup default libm

computationally intensive loops. When there are libm calls
expf (binary32) 4.0
logf (binary32) 2.7 in such loops, these functions must be vectorized themselves.
K1a
exp (binary64) 5.8 One approach is to define vector types (which may be
log (binary64) 5.8 exposed to the programmer, but also infered by the compiler),
newlib
expf (binary32) 4.0
logf (binary32) 2.7 and to provide vector versions of each libm function. This
K1b
exp (binary64) 1.8 solution is offered for instance by Intel and AMD, at con-
log (binary64) 2.2 siderable development cost. Note that no standard exists for
expf (binary32) 1.7 functions with vector arguments.
logf (binary32) 1.02
core i7, SSE2 Another approach is to write the libm code in such a way
exp (binary64) 1.7
log (binary64) 1.6
glibc
that it will auto-vectorize well. Then, the vectorization itself
expf (binary32) 1.1 (i.e. the use of vector instructions) is delegated to the compiler.
logf (binary32) 0.96
core i7, AVX2 The code must obey certain rules, essentially regarding the
exp (binary64) 1.9
log (binary64) 1.6 tests: tests that translate to branches in the code should be
avoided, and replaced with tests that translate to data selection.
Table I: Speedups obtained with respect to default libm for
This entails speculative execution, which typically increases
latency-optimized C11-compliant exp and log functions.
the latency, as both branches of the test must be executed
All the variants in this table are generated from the same
before the selection. However vectorization makes up for this
description of exp and log.
by improving the throughput. Experiments with the Cephes
library [6] showed this approach to be extremely effective,
C. Some optimizations performed on the abstract syntax tree both in terms of computational efficiency and portability,
with recent versions of the GCC compiler. This approach is
1) Instruction selection: Instruction selection is a code therefore the one chosen here.
generation service provided by the processor class. We could, What it means is that there are optimization steps that
in principle, delegate it to the compiler that does it well. specifically rework the control path of the function, depending
To illustrate why we want to keep control on it, consider a on its intended use. For high-throughput, vectorizable code
simple example: fusing one addition and one multiplication with speculative execution must be generated. For low latency
into an FMA. This is desirable most of the time, as it improves code, the main objective is to make the common case fast
both accuracy and latency. However, the very fact that it [30], while speculative execution may still be used to exploit
is more accurate impacts the error computation. The code the ILP offered by the processor.
generator therefore needs to know what FMA fusion will be Initial experiments with our generator show that autovec-
performed. The impact may be deep. Consider for instance torized code on 4-way SSE2 SIMD achieves speedups above
the classical Cody and Waite argument reduction [1], that 3 over the scalar version. This is consistent with [6].
computes r = (x − kch ) − kcl where k is an integer chosen
such that the first subtraction cancels. Without an FMA, we
need to keep k a small integer, on a few bits, and define ch as D. A class for libm function
having so many trailing zeroes in its mantissa, so that kch may If the function is not a black box, but one of the C11 libm
be computed exactly. The subtraction is then Sterbenz-exact. functions, it comes with a more complete specification, for
With an FMA, this computation remains exact for much larger instance including the management of exceptional numbers in
k (as long as k may be represented exactly as floating point), input or output, a significant amount of the function code in a
and there is no need for zero bits in ch , which provides more classical libm. This is not only a constraint, as it also enables
accuracy in ch + cl . Here the FMA has an impact not only on us to use well known techniques for a specific function.
the accuracy of the result, but on the relevance domain of the Another advantage of having an explicit reference to the
range reduction. function concerns its testing. It is possible to generate random
Similar argument apply to other instruction selection situa- tests out of a black-box function, but we can do better if
tions, such as float/int conversions. the function is known. It is possible, for instance, to design
Another interesting example is the fast reciprocal approxi- cornercase or regression tests (for instance to check that a
mation. This instruction, offered by some processors to boot- function overflows exactly when it should or handle subnor-
strap FMA-based division, can also be used in an efficient mals properly – the Gappa-based proof generation does not
range reduction for the logarithm [12]. In this case, it must be cover special values). Also, for random testing, the default
emulated on processors that do not offer it. random generator can be overriden with a function-specific one
In the near future, we intend to explore using SIMD that will stress the function on the parts of its domain where it
parallelism to speed-up polynomial evaluation. This requires is most useful. This applies to functional testing, but even more
specific instruction to move data within a SIMD vector that to performance testing. For instance, what matters to users of
are extremely processor-dependent, and actually constrain the exp is its average performance on the small domain where it
evaluation scheme. This is another case for embedding instruc- is defined, not on the full floating-point range where it mostly
tion selection in our code generator framework. returns 0 or +∞. Similarly, performance tests for log should
2) Control path generation and vectorization: Modern use a pseudo-random number generator strongly disbalanced
compilers are more and more able to auto-vectorize long, towards positive numbers. For sin it should concentrate on
8

the first periods, but avoid over-testing the very small numbers handling of (out-of-order) processor architectures where the
(which represent almost half the floating-point numbers) where processor behavior is too hard to model anyway.
sin x ≈ x. Etc. Finally, even though the codes produced by our two ap-
The proposed framework therefore defines a class for a C11 proaches are already hardened through the use of Gappa as a
function with its test infrastructure. Instances of this class (one formal proof tool, integration of code generation into certified
per function) may overload some methods of this class, such FP code development suites such as Why, Frama-C or even
as the generator of regression tests (the defaults tests numbers model checkers might become of some interest.
such as the signed zeroes, infinities and NaN, and extreme
threshold values of the subnormal and normal domains) and R EFERENCES
one or few generators of random input. [1] J.-M. Muller, Elementary Functions: Algorithms and Implementation.
Secaucus, NJ, USA: Birkhauser Boston, Inc., 1997.
[2] M. Abramowitz and I. A. Stegun, Handbook of mathematical functions,
ser. Applied Math. Series 55. National Bureau of Standards, Washing-
E. Wrapping up: current state of the project ton, D.C., 1964.
The backend code generator (processor classes, generic [3] “IEEE standard for floating-point arithmetic,” IEEE 754-2008, also
ISO/IEC/IEEE 60559:2011, Aug. 2008.
optimization steps and proof generators) currently consists of [4] N. Kapre and A. DeHon, “Accelerating spice model-evaluation using
3800 lines of Python code (counted by the CLOC utility) for FPGAs,” Field-Programmable Custom Computing Machines, pp. 37–44,
the open-source part covering the generic and 86-optimized 2009.
[5] V. Innocente, “Floating point in experimental HEP data processing,” in
processor classes. In addition there are also a few proprietary 2nd CERN Openlab/INTEL Workshop on Numerical Computing, 2012.
files related to the Kalray processor. [6] P. Danilo, “The VDT mathematical library,” in 2nd CERN Openlab/IN-
The C11 function generators currently amount to 2482 lines TEL Workshop on Numerical Computing, 2012.
[7] J. Apostolakis, A. Buckley, A. Dotti, and Z. Marshall, “Final report
of code. This includes exp, log, log1p, log2, and sin/cos at of the ATLAS detector simulation performance assessment group,”
various degrees of completion (they generate working code, CERN/PH/SFT, CERN-LCGAPP-2010-01, 2010. [Online]. Available:
but not always of performance comparable to handwritten http://sftweb.cern.ch/AAdocuments
[8] G. Paul and M. W. Wilson, “Should the elementary functions be
code). There are also inverse, division, inverse square root, incorporated into computer instruction sets?” ACM Transactions on
and square root implementations, essentially intended for Mathematical Software, vol. 2, no. 2, pp. 132–142, 1976.
processors without a hardware divider, such as the IA64 or [9] P. T. P. Tang, “Table-driven implementation of the exponential function
in IEEE floating-point arithmetic,” ACM Transactions on Mathematical
Kalray. Inverse approximation is actually of more general use Software, vol. 15, no. 2, pp. 144–157, 1989.
and can replace standard division in situations where it is [10] S. Gal and B. Bachelis, “An accurate elementary mathematical library for
needed with accuracy lower or higher than the floating-point the IEEE floating point standard,” ACM Transactions on Mathematical
Software, vol. 17, no. 1, pp. 26–45, 1991.
precision. A typical function generator typically consists of [11] P. Markstein, IA-64 and Elementary Functions: Speed and Precision,
200-300 lines of code. Considering that the generated code is ser. Hewlett-Packard Professional Books. Prentice Hall, 2000.
typically 100-200 lines, the Gappa proof about the same size, [12] M. Cornea, J. Harrison, and P. T. P. Tang, Scientific Computing on
Itanium R -based Systems. Intel Press, 2002.
and that one generator produces many code flavors, the code [13] P. Markstein, “Accelerating sine and cosine evaluation with compiler
generator approach should be much easier to maintain. assistance,” in 16th Symposium on Computer Arithmetic, . P. t. E. C.
o.-C. B. Jean and M. Schulte, Eds. IEEE Computer Society, 2003, pp.
Near-term future work include adding ARM processors and 137–140.
more functions. Medium-term future work include an OpenCL [14] R. M. S. e. a. S. Loosemore, The GNU C Library Reference Manual.
backend to target GPUs (increasingly used as floating-point ac- Free Software Foundation, Inc.
[15] A. Ziv, “Fast evaluation of elementary mathematical functions with cor-
celerators), more evaluation scheme optimization using CGPE rectly rounded last bit,” ACM Transactions on Mathematical Software,
[29], and the generation of correctly-rounded function flavors. vol. 17, no. 3, pp. 410–423, 1991.
[16] F. de Dinechin, C. Q. Lauter, and G. Melquiond, “Certifying the
floating-point implementation of an elementary function using gappa,”
V. C ONCLUSIONS AND F UTURE WORK IEEE Trans. Computers, vol. 60, no. 2, pp. 242–253, 2011. [Online].
Available: http://doi.ieeecomputersociety.org/10.1109/TC.2010.128
This article discusses two approaches to addressing the [17] R. C. Whaley and A. Petitet, “Minimizing development and maintenance
challenges faced by libm developers. The first is automated costs in supporting persistently optimized BLAS,” Software: Practice
and Experience, vol. 35, no. 2, pp. 101–121, February 2005,
generation of evaluation schemes, to address the large number http://www.cs.utsa.edu/˜whaley/papers/spercw04.ps.
of functions of interest. The second is a specific development [18] M. Frigo and S. G. Johnson, “The design and implementation of
framework, to address the large number of hardware targets FFTW3,” Proceedings of the IEEE, vol. 93, no. 2, pp. 216–231, 2005,
special issue on “Program Generation, Optimization, and Platform
of interest. These two approaches were developed indepen- Adaptation”.
dently, and current work focuses on integrating them more. [19] M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer,
A good case study for working on such integration is the J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W.
Johnson, and N. Rizzolo, “SPIRAL: Code generation for DSP trans-
correct rounding of elementary functions. It presents several forms,” Proceedings of the IEEE, special issue on “Program Generation,
challenges, such as function evaluation in larger-than-standard Optimization, and Adaptation”, vol. 93, no. 2, pp. 232– 275, 2005.
precisions or less common formats, or vectorization of a [20] N. Brisebarre and S. Chevillard, “Efficient polynomial l-
approximations,” in 18th IEEE Symposium on Computer Arithmetic
technique centered on a run-time test [15]. (ARITH-18 2007), 25-27 June 2007, Montpellier, France, 2007,
In addition, both approaches, as well as the final integrated pp. 169–176. [Online]. Available: http://doi.ieeecomputersociety.org/10.
code generator should also integrate notions of code tuning 1109/ARITH.2007.17
[21] D.-U. Lee, P. Cheung, W. Luk, and J. Villasenor, “Hierarchical segmen-
through code timing, as in software packages such as ATLAS. tation schemes for function evaluation,” IEEE Transactions on VLSI
This allows for simplification of modeling of processors or Systems, vol. 17, no. 1, 2009.
9

[22] O. Kupriianova and C. Q. Lauter, “A domain splitting algorithm for the

mathematical functions code generator. to appear,” in 2014 Asilomar
Conference on Signals, Systems and Computers, Pacific Grove, CA, USA,
November 2-5, 2014, 2014.
[23] P.-T. P. Tang, “Table-driven implementation of the exponential
function in ieee floating-point arithmetic,” ACM Trans. Math. Softw.,
vol. 15, no. 2, pp. 144–157, Jun. 1989. [Online]. Available:
http://doi.acm.org/10.1145/63522.214389
[24] P. T. P. Tang, “Table-driven implementation of the logarithm
function in IEEE floating-point arithmetic,” ACM Trans. Math.
Softw., vol. 16, no. 4, pp. 378–400, 1990. [Online]. Available:
http://doi.acm.org/10.1145/98267.98294
[25] ——, “Table-driven implementation of the expm1 function in IEEE
floating-point arithmetic,” ACM Trans. Math. Softw., vol. 18, no. 2, pp.
211–222, 1992. [Online]. Available: http://doi.acm.org/10.1145/146847.
146928
[26] S. Chevillard, M. Joldeş, and C. Lauter, “Sollya: An environment for
the development of numerical codes,” in Mathematical Software - ICMS
2010, ser. Lecture Notes in Computer Science, K. Fukuda, J. van der
Hoeven, M. Joswig, and N. Takayama, Eds., vol. 6327. Heidelberg,
Germany: Springer, September 2010, pp. 28–31.
[27] S. Chevillard, M. Joldes, and C. Q. Lauter, “Certified and fast
computation of supremum norms of approximation errors,” in 19th
IEEE Symposium on Computer Arithmetic, ARITH 2009, Portland,
Oregon, USA, 9-10 June 2009, 2009, pp. 169–176. [Online]. Available:
http://doi.ieeecomputersociety.org/10.1109/ARITH.2009.18
[28] O. Kupriianova and C. Lauter, “Replacing branches by polynomials
in vectorizable elementary functions,” in Book of abstracts for 16th
GAMM-IMACS International Symposium on Scientific Computing, Com-
puter Arithmetic and Validated Numerics, 2014.
[29] C. Mouilleron and G. Revy, “Automatic Generation of Fast and
Certified Code for Polynomial Evaluation,” in 20th IEEE Symposium
on Computer Arithmetic (ARITH), Tübingen, Germany, Aug. 2011, pp.
233–242. [Online]. Available: http://hal.inria.fr/ensl-00531721/en
[30] C. S. Anderson, S. Story, and N. Astafiev, “Accurate math functions
on the Intel IA-32 architecture: A performance-driven design,” in 7th
Conference on Real Numbers and Computers, 2006, pp. 93–105.

View publication stats

Online Evaluation For Effective Web Service Development: Extended Abstract of The Tutorial at Thewebconf'2018
No ratings yet
Online Evaluation For Effective Web Service Development: Extended Abstract of The Tutorial at Thewebconf'2018
5 pages
Effectof Delayed Coking Pressureonthe Yieldand Qualityof
No ratings yet
Effectof Delayed Coking Pressureonthe Yieldand Qualityof
10 pages
Muravyevaetal 2014
No ratings yet
Muravyevaetal 2014
19 pages
Teaching Operating Systems Student Assignments and PDF
No ratings yet
Teaching Operating Systems Student Assignments and PDF
8 pages
Nucleation of Ordered Solid Phases of Proteins Via A Disordered High Density State Phenomenological Approach
No ratings yet
Nucleation of Ordered Solid Phases of Proteins Via A Disordered High Density State Phenomenological Approach
8 pages
Production of Al Metal Matrix Composites (MMC) Reinforced With Carbon Nanotubes by Two-Stage Melt-Based HPDC-CE Method
No ratings yet
Production of Al Metal Matrix Composites (MMC) Reinforced With Carbon Nanotubes by Two-Stage Melt-Based HPDC-CE Method
8 pages
Conceptual Framework
No ratings yet
Conceptual Framework
10 pages
PRIMUS A Windows PC-based System For Small-Angle S
No ratings yet
PRIMUS A Windows PC-based System For Small-Angle S
8 pages
The Rich Get Richer and The Poor Get Poorer
No ratings yet
The Rich Get Richer and The Poor Get Poorer
30 pages
240 Seismically Induced Soft-Sediment Deformation 2
No ratings yet
240 Seismically Induced Soft-Sediment Deformation 2
4 pages
Shulhin
No ratings yet
Shulhin
6 pages
High Performance Concrete With Czech Metakaolin Experimental Analysis of Strength Toughness and Durability Characteristics
No ratings yet
High Performance Concrete With Czech Metakaolin Experimental Analysis of Strength Toughness and Durability Characteristics
9 pages
Diversity of Insect Intestinal Microflora PDF
100% (1)
Diversity of Insect Intestinal Microflora PDF
6 pages
Diversity of Insect Intestinal Microflora
No ratings yet
Diversity of Insect Intestinal Microflora
6 pages
Computational Immunology
No ratings yet
Computational Immunology
8 pages
Sustainability 13 116491
No ratings yet
Sustainability 13 116491
24 pages
Intelligent Component Based Batch Control Using IEC 61 499 and ANSI/ISA S88
No ratings yet
Intelligent Component Based Batch Control Using IEC 61 499 and ANSI/ISA S88
7 pages
Processing of Al SC Aluminum Alloy Using SLM Technology
No ratings yet
Processing of Al SC Aluminum Alloy Using SLM Technology
6 pages
Small Molecules Enable Highly Efficient Neuronal Conversion of Human Fibroblasts
No ratings yet
Small Molecules Enable Highly Efficient Neuronal Conversion of Human Fibroblasts
7 pages
2016 Pieterman BSAF
No ratings yet
2016 Pieterman BSAF
11 pages
Dependence of Magnetic Penetration Depth On The TH
No ratings yet
Dependence of Magnetic Penetration Depth On The TH
9 pages
Sustainability 10 044431
No ratings yet
Sustainability 10 044431
20 pages
Study of The Skin Effect in Superconducting Materials: October 2016
No ratings yet
Study of The Skin Effect in Superconducting Materials: October 2016
5 pages
Green Bipropellant Optimisation AOkninski
No ratings yet
Green Bipropellant Optimisation AOkninski
11 pages
Determination of Sulfur and Trace Elements in Petroleum Coke by X-Ray Uorescent Spectrometry
No ratings yet
Determination of Sulfur and Trace Elements in Petroleum Coke by X-Ray Uorescent Spectrometry
8 pages
Anxiety & Depression in Gifted People
No ratings yet
Anxiety & Depression in Gifted People
20 pages
1 s2.0 S1044580318320096 Main
No ratings yet
1 s2.0 S1044580318320096 Main
13 pages
Information Security in SMEs & Agriculture
No ratings yet
Information Security in SMEs & Agriculture
15 pages
Critical Incidents & Service Evaluation
No ratings yet
Critical Incidents & Service Evaluation
5 pages
Ri H Postmodern Therapies 2
No ratings yet
Ri H Postmodern Therapies 2
2 pages
Stability of Biochem in WB and Serum
No ratings yet
Stability of Biochem in WB and Serum
10 pages
Size Shape and Surface Morphology of Starch Granul
No ratings yet
Size Shape and Surface Morphology of Starch Granul
8 pages
El Chichon: The Genesis of Volcanic Sulfur Dioxide Monitoring From Space
No ratings yet
El Chichon: The Genesis of Volcanic Sulfur Dioxide Monitoring From Space
8 pages
Antimicrobial Nano-Silver Non-Woven Polyethylene Terephthalate Fabric Via An Atmospheric Pressure Plasma Deposition
No ratings yet
Antimicrobial Nano-Silver Non-Woven Polyethylene Terephthalate Fabric Via An Atmospheric Pressure Plasma Deposition
11 pages
Au Apuseni Localitati Cercetate
No ratings yet
Au Apuseni Localitati Cercetate
14 pages
Windley - Tectonic Models For Accretion of The Central Asian Orogenic Belt
No ratings yet
Windley - Tectonic Models For Accretion of The Central Asian Orogenic Belt
18 pages
Ranabhat K
No ratings yet
Ranabhat K
12 pages
Producing Hip Implants of Titanium Alloys by Additive Manufacturing
No ratings yet
Producing Hip Implants of Titanium Alloys by Additive Manufacturing
8 pages
Human-Nature Systems Complexity
No ratings yet
Human-Nature Systems Complexity
5 pages
A Proposed Experiment To Test Spin-Dependent Effects Beyond Einstein's Theory of Gravitation: The Pound-Rebka Experiment With Spin
No ratings yet
A Proposed Experiment To Test Spin-Dependent Effects Beyond Einstein's Theory of Gravitation: The Pound-Rebka Experiment With Spin
13 pages
A Critical Review of LiAir Batteries
No ratings yet
A Critical Review of LiAir Batteries
31 pages
Evaluation of The Results of Cleft Lip and Palate
No ratings yet
Evaluation of The Results of Cleft Lip and Palate
10 pages
1 s2.0 S0953620518301912 Main
No ratings yet
1 s2.0 S0953620518301912 Main
4 pages
2003 Broadband Dielectric Spectroscopy of BS PT
No ratings yet
2003 Broadband Dielectric Spectroscopy of BS PT
4 pages
Visualization of Blockchain Data A Systematic Revi
No ratings yet
Visualization of Blockchain Data A Systematic Revi
22 pages
Kussul Et Al 2017 GRSL DeepLearning
No ratings yet
Kussul Et Al 2017 GRSL DeepLearning
6 pages
One-Dimensional Steady-State Thermal Model For Rotary Kilns Used in The Manufacture of Cement
No ratings yet
One-Dimensional Steady-State Thermal Model For Rotary Kilns Used in The Manufacture of Cement
11 pages
2012 PPSvol11pp1121 1150HistoryofPhotocatalysis
No ratings yet
2012 PPSvol11pp1121 1150HistoryofPhotocatalysis
31 pages
Metamizole: Pharma & Nonclinical Review
No ratings yet
Metamizole: Pharma & Nonclinical Review
10 pages
Selective Media for PU-Degrading Bacteria
No ratings yet
Selective Media for PU-Degrading Bacteria
17 pages
VHV AC Power Line Simulation Model
No ratings yet
VHV AC Power Line Simulation Model
8 pages
The Crystal Structure of Chitin and Chitosan PDF
No ratings yet
The Crystal Structure of Chitin and Chitosan PDF
9 pages
High-Sensitivity Detection of TNT
No ratings yet
High-Sensitivity Detection of TNT
6 pages
Lycodon
No ratings yet
Lycodon
14 pages
The Perspective of Quality Management System Development in The Era of Industry - Salimova Et. Al. (2020)
No ratings yet
The Perspective of Quality Management System Development in The Era of Industry - Salimova Et. Al. (2020)
14 pages
Manipulation Tasks in Robotics Education
No ratings yet
Manipulation Tasks in Robotics Education
7 pages
2016 - Recent Developments in Handheld Raman-Spectros
No ratings yet
2016 - Recent Developments in Handheld Raman-Spectros
8 pages
Ionizing Radiation Effects on Tumor Microenvironment
No ratings yet
Ionizing Radiation Effects on Tumor Microenvironment
10 pages
Graphene Phen Chemphyschem 2016
No ratings yet
Graphene Phen Chemphyschem 2016
10 pages
Individual Footings (17.12.09) EDIT by J3
No ratings yet
Individual Footings (17.12.09) EDIT by J3
32 pages
The Acidity and Basicity of Organic Acids and Bases For VULA
No ratings yet
The Acidity and Basicity of Organic Acids and Bases For VULA
24 pages
Week4 Lecture 4
No ratings yet
Week4 Lecture 4
9 pages
Operator'S Manual: Navigational Echo Sounder
No ratings yet
Operator'S Manual: Navigational Echo Sounder
48 pages
H902NXED Board Datasheet
No ratings yet
H902NXED Board Datasheet
1 page
JVC RX-8030VBK Instruction Manual
100% (4)
JVC RX-8030VBK Instruction Manual
114 pages
Nutrient Uptake & Translocation Guide
No ratings yet
Nutrient Uptake & Translocation Guide
5 pages
GR 10 Life Sciences IEB 3 in 1 Extracts PDF
100% (2)
GR 10 Life Sciences IEB 3 in 1 Extracts PDF
16 pages
Bhma A156 10 1999
No ratings yet
Bhma A156 10 1999
50 pages
Statistical Inference
No ratings yet
Statistical Inference
52 pages
Template FT
No ratings yet
Template FT
15 pages
GIS Assignment
No ratings yet
GIS Assignment
4 pages
Class XII IT Sample Question Paper 802
No ratings yet
Class XII IT Sample Question Paper 802
11 pages
SA227 Airframe Manual Update
No ratings yet
SA227 Airframe Manual Update
16 pages
Research Methodology Overview
No ratings yet
Research Methodology Overview
11 pages
CT-01 Solutions
No ratings yet
CT-01 Solutions
44 pages
ME412 Heat Transfer Syllabus 2024
No ratings yet
ME412 Heat Transfer Syllabus 2024
4 pages
Python Bank Loan Management App
100% (1)
Python Bank Loan Management App
60 pages
R501: Versatile Fiber Optic Monitor
No ratings yet
R501: Versatile Fiber Optic Monitor
4 pages
Color TFT LCD Monitor: Operating Instruction Model No. AN5601B
No ratings yet
Color TFT LCD Monitor: Operating Instruction Model No. AN5601B
4 pages
Mark Meaney, Capital As Organic Unity PDF
No ratings yet
Mark Meaney, Capital As Organic Unity PDF
100 pages
Physics Exam Problems
No ratings yet
Physics Exam Problems
4 pages
Sheet Pile Design
100% (1)
Sheet Pile Design
51 pages
Baking and Pastry Notes
No ratings yet
Baking and Pastry Notes
26 pages
Decision Theory
No ratings yet
Decision Theory
17 pages
Intersemiotic Texture Analysis
No ratings yet
Intersemiotic Texture Analysis
23 pages
Hamaheat Brochure
No ratings yet
Hamaheat Brochure
4 pages
Consistency of Linear System of Equations
No ratings yet
Consistency of Linear System of Equations
16 pages
AMT WEEK 4 Grade 9
No ratings yet
AMT WEEK 4 Grade 9
2 pages
DR Label Software Operation - Toc
No ratings yet
DR Label Software Operation - Toc
5 pages

Code Generators For Mathematical Functions

Uploaded by

Code Generators For Mathematical Functions

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Code generators for mathematical functions

Article · November 2014

Nicolas Brunie Florent De Dinechin

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

To cite this version:

HAL Id: hal-01084726

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

Code generators for mathematical functions

open-ended code generation

black-box approximation scheme

Figure 1: The proposed tool suite.

A. Polynomial approximation C. Existing libm development tools

1e-15 However, open-ended code generation is currently limited

0.1 IV. A CODE GENERATION FRAMEWORK ASSISTING L I B M

Python generic code:

processor function speedup default libm

[22] O. Kupriianova and C. Q. Lauter, “A domain splitting algorithm for the

View publication stats

You might also like