Numerical Differentiation using local
Chebyshev-Approximation
Stefan H. Reiterer
February 5, 2021
arXiv:2102.02269v1 [math.NA] 3 Feb 2021
Abstract
1 Motivation
In applied mathematics, especially in optimization, functions are often only provided as so called
”Black-Boxes” provided by software packages, or very complex algorithms, which make automatic
differentation very complicated or even impossible. Hence one seeks the numerical approximation
of the derivative.
Unfortunately numerical differentation is a difficult task in itself, and it is well known that
it is numerical instable. There are many works on this topic, including the usage of (global)
Chebyshev approximations. Chebyshev approximations have the great property that they con-
verge very fast, if the function is smooth. Nevertheless those approches have several drawbacks,
since in practice functions are not smooth, and a global approximation needs many function
evalutions.
Nevertheless there is hope. Since functions in real world applications are most times smooth
except for finite points, corners or edges. This motivates to use a local Chebyshev approach,
where the function is only approximated locally, and hence the Chebyshev approximations still
yields a fast approximation of the desired function. We will study such an approch in this work,
and will provide a numerical example.
Disclaimer: This work was done as private research during my study years and is not related
to my current affiliation at all. No funding was recieved for this work.
2 Chebyshev Polynomials and One Dimensional Functions
First we recall some well known definitions. the following definitions and properties are taken
from [1].
Definition 2.1 (Chebyshev-Polynomials). For n ∈ N0 the n-th Chebyshev polynomial Tn (x) :
[−1, 1] → R is given by Tn (x) = cos(n arccos(x)). From this definition we see that the Chebyshev
polynomial takes it’s extrema at
kπ
xk := − cos for k = 0, . . . , n.
n
The points (xk )nk=0 are the so called Gauss-Lobatto Grid-points.
1
The best way to approximate functions via Chebyshev-polynomials lies in the following
Proposition 2.2 (Chebyshev-Series & Co). Set ω(x) := 2
π (1 − x2 )−1/2 , then
Z1
hf, giω := f (x)g(x)ω(x) dx
−1
forms the (weighted) scalar product of the Hilbert-Space
q
L2,ω := f : [−1, 1] → R | kf kL2,ω = hf, f iω < ∞ .
Then for a function f ∈ L2,ω the Chebyshev-Coefficients
an := hf, Tn iω ,
exists, and the Chebyshev-Series
∞
a0 X
+ an Tn (x),
2 n=1
converges in the L2,ω sense to f .
Proof. These well known facts follow from applying classical Fourier-theory to
f˜ = f (cos(t)) ∈ L2 (0, π).
Since Chebyshev-Polynomials are “cosines in disguise” other important properties from Fourier-
theory carry over to Chebyshev-series, like spectral convergence for smooth functions etc.
Another way to approximate a function is to use the Chebyshev-Interpolation Polynomial
Pn
Cn = = bk Tk , which is uniquely defined by f (xk ) = Cn (xk ) for k = 0, . . . , n.
k=0
Although the Chebyshev-Series yields the best approximation there is a little known fact
between the two approximation types, namely the
Proposition 2.3. Let f : [−1, 1] → R a function with
∞
a0 X
f (x) = + an Tn (x),
2 n=1
and
∞
|a0 | X
+ |an | < ∞.
2 n=1
a0 PN
Further let for fN (x) := 2 + n=1 an Tn (x),
ET (N ) := sup |f (x) − fN (x)| ,
x∈[−1,1]
be the trunctation error, and Then
∞
X
ET (N ) ≤ |an | ,
n=N
2
and for the interpolation error
sup |f (x) − CN (x)| ≤ 2ET (N ).
x∈[−1,1]
Hence the penalty for using interpolation instead of truncation is at most a factor of two! Ad-
ditionally the coefficients bk of the interpolation polynomial Cn are related to the exact coefficents
ak by the identity
X∞
bk = ak + ak+2jn + a−k+2jn . (1)
j=1
That means the approximated coefficients differ from the exact coefficients by the aliasing-error,
and hence the error vanishes with O(an )
Proof. See [1, Thm. 6& Thm. 21].
Since with Chebyshev interpolation we are close to the realm of the DFT, additional methods
from signal processing, like denoising could be considered to handle numerical distortions.
To work with a function g : [a, b] → R a generalized Chebyshev interpolation for g can
be achieved by using a linear transformation ϕ : [a, b] → [−1, 1], and interpolate the function
f = g ◦ ϕ.
Since computing the derivative of a polynomial can be done exactly we could use the derivative
of CN for some N , as the numerical derivative of f . This means
f 0 ≈ CN
0
.
Then how about the errors? First recall the identity Tn0 (x) = nUn−1 . Hence for the truncation
error we have
X∞
f 0 (x) − fN
0 0
(x) = EN (x) = nan Un−1 (x).
n=N
Considering that Un has it’s extrema at ±1 with Un−1 (±1) = (±1)n n, we immediately see that
∞
X ∞
X
sup |f 0 (x) − fN
0
(x)| ≤ n |an | sup |Un−1 (x)| ≤ n2 |an | = O(n2 an ).
x∈[−1,1] n=N x∈[−1,1] n=N
For the interpolation polynomial CN we get by using the aliasing identity (1) and rearranging
terms like in the proof of [1, Thm. 21] the error estimation
∞
X
|f 0 (x) − CN
0
(x)| ≤ 2 n2 |an | = O(2n2 an ).
n=N
Hence we have the
Proposition 2.4. For f : [−1, 1] → R smooth we have the error estimation
sup |f 0 (x) − CN
0
(x)| ≤ 2ET0 (N )O(2n2 an ).
x∈[−1,1]
Hence the penalty for using interpolation instead of truncation when differentiating is at most a
factor of two!
3
3 Practical Considerations in the 1D Case and General-
izations
In this section consider a function f : [a, b] → R, which is continuous and piecewise smooth, but
not differentiable in a set of finite points (yk )k∈I ⊂ [a, b] (with I finite and allowed to be empty).
If we interpolate now the function f on the interval [a, b], we know from Fourier-Theory, that
the error will converge only slowly to zero. We also note that it is easy to see that (from variable
transformations) that error estimation depends on the length of the interval [a, b]. with the
factor ( b−a N
2 ) . Hence if one restricts the function f only on a small interval, one achieves more
precise interpolation results, with lesser degrees of the interpolation polynomial. If we restrict
the function f to some interval [c, d] ⊂ [a, b], with
1. d − c < 1 and
2. yi 6∈ (c, d) for i ∈ I,
then the sequence of the Chebyshev interpolation polynomials (CN )N ∈N converges very fast (with
spectral convergence) and uniformly to the original function in the interval [c, d].
0
From Proposition 2.4 we know that this then also applies to CN , since the error decays with
2
O(n an ). Nevertheless we also see that for practical use it is important to ensure the smoothness
on the observed interval [c, d], because of the term n2 . This leads to the following definition
Definition 3.1. Let f : [a, b] → R be continuous and piecewise smooth, then we call the Cheby-
shev interpolation polynomial CN,[c,d] : [c, d] → R of f|[c,d] the local smooth Chebyshev
polynomial iff
(yk )k∈I ∩ (c, d) = ∅.
This motivates the following algortithm to compute derivatives of a function f (if the points
Y := (yk )k∈I are known):
# i n p u t : f u n c t i o n f , and p o i n t x , e s t i m a t e d l e n g h t h
s e t o f d i s a l l o w e d p o i n t s Y, number o f i n t e r p o l a t i o n p o i n t s N
d e f c o m p u t e d e r i v a t i v e ( f , x , h ,Y ) :
i f x not i n Y:
# classical derivative
c = x−h ; d= x+h
# e n s u r e t h a t we a r e smooth on i n t e r v a l [ c , d ]
w h i l e c < a and b < d and i n t e r s e c t i o n (Y , [ c , d ] ) != { } :
h = make h smaller (h)
c = x−h ; d= x+h
CN = c o m p u t e i n t e r p o l a t i o n ( f , c , d ,N)
CN prime = d e r i v e p o l y n o m i a l (CN)
r e t u r n CN prime ( x )
e l s e : # x in Y
# subgradient
c [ 0 ] = x−h ; d [ 0 ] = x
c [ 1 ] = x ; d [ 1 ] = x+h
w h i l e a l l ( [ c [ i ] < a and b < d [ i ] and \
i n t e r s e c t i o n (Y, ( c [ i ] , d [ i ] ) ) != {} f o r i i n [ 0 , 1 ] ] ) :
4
x h fh C3,[x±h] C5,[x±h]
0.5 1e − 3 2e − 6 2e − 6 2e − 14
1e − 4 1.9e − 8 1.9e − 8 2.6e − 13
1e − 5 1.9e − 10 1.9e − 10 1.8e − 12
0.0 1e − 3 5e − 10 5e − 10 1.4e − 10
1e − 4 5e − 13 5e − 13 2.6e − 13
1e − 5 5e − 16 5e − 16 1.5e − 16
Table 1: Errors for f1
h = make h smaller (h)
c [ 0 ] = x ; d [ 0 ] = x+h
c [ 1 ] = x−h ; d [ 1 ] = x
for i in [ 0 , 1 ] :
CN[ i ] = c o m p u t e i n t e r p o l a t i o n ( f , c [ i ] , d [ i ] ,N)
CN prime [ i ] = d e r i v e p o l y n o m i a l (CN[ i ] )
r e t u r n [ CN prime [ 0 ] ( x ) , CN prime [ 1 ] ( x ) ]
As one can see we also included more general derivatives based on directional derivatives like sub-
gradients. It also would be possible to define a weak derivative witch is motivated by Chebyshev
convergence theory defined by f 0 (x) := 12 (f 0 (−x) + f 0 (+x)).
A question which still remains, is what to do when the set Y is unknown. This is still an
open problem, but since one can observe the speed of the decay of the coefficients an (or bn ) it is
possible to locate non differentiable points. There are several works on this topic from spectral
theory.
Now to the M dimensional case: Consider a functional f : Ω ⊆ RM → R, which is continuous
and piecewise smooth on the domain Ω. We can compute the directional derivatives in a direction
h ∈ RM in a point x ∈ Ω by deriving the one dimensional function g(t) := f (x + th). Hence the
methods from the one dimensional case, can be carried over to the M dimensional setting.
4 Examples
as first we define the function f1 given by (see also Figure 4):
(
x4 if x > 0,
f1 (x) =
0 else.
First we compare computing the first derivative with local Chebyshev approximation with finite
differences given by
f (x + h) − f (x − h)
f 0 (x) ≈ =: fh (x),
2h
by computing the values at x = 0 and x = 0.5. See Table 4 for results. As we can observe
the errors of fh and C3,[x±h] are equal. This comes without surprise, since the derivative of the
3rd Chebyshev polynomial is exactly the central difference quotient. Next we will modify f1 in
the following way: For some ε > 0, and a randomly standard normal distributed variable X we
define f2 by
f2 (x) = f1 (x) + εX
5
x h fh C3,[x±h] C5,[x±h] C7,[x±h]
0.5 1e − 1 2e − 2 2e − 2 5.5e − 9 4.2e − 8
1e − 2 2.4e − 6 2e − 4 4e − 7 8e − 7
1e − 3 2.4e − 6 2e − 6 1.8e − 6 7.3e − 6
1e − 4 1e − 6 4e − 6 2e − 5 2e − 5
1e − 5 2.8e − 5 9e − 5 2e − 5 7e − 4
1e − 6 7e − 5 1e − 3 4e − 5 2e − 3
Table 2: Errors for f2
Figure 1: Function f1
and then try to differentiate it at 0.5. the results can be seen at Table 4. The errors are maximal
values after taking some samples (There is some space for improvement here, e.g. better
statistics, but we get the picture...) We Observe the following behavior: While fh and
C3,[x±h] behave like expected, we see that making h smaller, does not make the approximation
quality of C5,[x±h] and C7,[x±h] better, but even worse. This can be explained by the fact, that
for smooth f , we know that the function behaves like it’s Taylor polynomial of lower order,
while the higher order terms are neglectable. This also means that computing a higher order
Chebyshev polynomial does not make sense, and we even get into more trouble. hence when
choosing a suitable h for the Chebyshev approximation should be not too small, and yet not too
big.
Finally we try the Local Chebyshev Method on the 2D Rosenbrock function
Ra,b (x) = (a − x0 )2 + b(x1 − x20 )2 ,
with parameters a = 1 and b = 100, and try to find the minimum with help of gradient methods.
The argmin is x0 = (1, 1) Additionally we add some disturbances δ(x) and ε(x) in the interval
[−1e−6, 1e−6]. While δ is some randomly normal distributed variable, ε is a jump function which
changes it’s sign (which is fatal for finite differences). As optimization method we used Steepest
Descent with Armijo-rule and k∇f k < 1e − 3 as terminating criterion. We used h = 1e − 6 for
the finite difference method, and order 5 polynomials with h = 1e − 4 for the Local Chebyshev
Method. and The number of iterations and the computed results (rounded up to 3 digits) can
be seen in Table 4. The results suggest, that computing the gradient with the Local Chebyshev
Method makes the algorithm more robust. Also it was observed that the stepsize h should be
6
Function Method Iteration Numbers Result
Ra,b Exact Gradient 1427 (0.999, 0.998)
Finite Differences 1427 (0.999, 0.998)
Local-Chebyshev 1428 (0.999, 0.998)
Ra,b + δ Finite Differences 19999 (interrupted) (0.986, 0.973)
Local-Chebyshev 1090 (0.996, 0.992)
Ra,b + ε Finite Differences 19999 (interrupted) (0.486, 0.234)
Local-Chebyshev 1565 (0.9990.998)
Table 3: Iteration numbers for optimization
not too small.
References
[1] Boyd, John P. Chebyshev and Fourier spectral methods. Courier Corporation, 2001.