7.6 Least squares approximation
#%config InlineBackend.figure_format = 'svg'
from pylab import *
from scipy.interpolate import barycentric_interpolate
7.6.1 Best approximation in 2-norm ¶ The minimax approximation is the best approximation since it minimizes the error at all points. This involves an optimization problem where we have to minimize the maximum norm of the error and the Remez algorithm is expensive. Optimization problems can be solved by gradient based methods but we cannot employ this due to the maximum norm which is not differentiable. To use gradient methods, let us change the norm to the L 2 L^2 L 2 norm
∥ g ∥ 2 = ( ∫ a b ∣ g ( x ) ∣ 2 d x ) 2 , g ∈ C [ a , b ] \norm{g}_2 = \left( \int_a^b |g(x)|^2 \ud x \right)^2, \qquad g \in \cts[a,b] ∥ g ∥ 2 = ( ∫ a b ∣ g ( x ) ∣ 2 d x ) 2 , g ∈ C [ a , b ] For a given f ∈ C [ a , b ] f \in \cts[a,b] f ∈ C [ a , b ] and n ≥ 0 n \ge 0 n ≥ 0 , define
M n ( f ) = inf r ∈ P n ∥ f − r ∥ 2 M_n(f) = \inf_{r \in \poly_n} \norm{f - r}_2 M n ( f ) = r ∈ P n inf ∥ f − r ∥ 2 Is there a best approximating polynomial r n ∗ ∈ P n r_n^* \in \poly_n r n ∗ ∈ P n , i.e.,
∥ f − r n ∗ ∥ 2 = M n ( f ) \norm{f - r_n^*}_2 = M_n(f) ∥ f − r n ∗ ∥ 2 = M n ( f ) Is it unique ?
How can we compute it ?
Consider the function
f ( x ) = e x , x ∈ [ − 1 , 1 ] f(x) = \ee^x, \qquad x \in [-1,1] f ( x ) = e x , x ∈ [ − 1 , 1 ] Let us find
r 1 ( x ) = b 0 + b 1 x r_1(x) = b_0 + b_1 x r 1 ( x ) = b 0 + b 1 x by minimizing the L 2 L^2 L 2 error norm
∥ f − r 1 ∥ 2 2 = ∫ − 1 1 [ e x − b 0 − b 1 x ] 2 d x = : F ( b 0 , b 1 ) \norm{f - r_1}_2^2 = \int_{-1}^1[\ee^x - b_0 - b_1 x]^2 \ud x =: F(b_0, b_1) ∥ f − r 1 ∥ 2 2 = ∫ − 1 1 [ e x − b 0 − b 1 x ] 2 d x =: F ( b 0 , b 1 ) The optimality conditions are
∂ F ∂ b 0 = ∂ F ∂ b 1 = 0 \df{F}{b_0} = \df{F}{b_1} = 0 ∂ b 0 ∂ F = ∂ b 1 ∂ F = 0 yield the linear system of equations
b 0 ∫ d x + b 1 ∫ x d x = ∫ e x d x b 0 ∫ x d x + b 1 ∫ x 2 d x = ∫ x e x d x \begin{align}
b_0 \int \ud x + b_1 \int x \ud x &= \int \ee^x \ud x \\
b_0 \int x \ud x + b_1 \int x^2 \ud x &= \int x \ee^x \ud x
\end{align} b 0 ∫ d x + b 1 ∫ x d x b 0 ∫ x d x + b 1 ∫ x 2 d x = ∫ e x d x = ∫ x e x d x whose solution is
b 0 = 1 2 ∫ − 1 1 e x d x = sinh ( 1 ) ≈ 1.1752 b 1 = 3 2 ∫ − 1 1 x e x d x = 3 e ≈ 1.1036 \begin{aligned}
b_0 &= \half \int_{-1}^1 \ee^x \ud x = \sinh(1) \approx 1.1752 \\
b_1 &= \frac{3}{2} \int_{-1}^1 x \ee^x \ud x = \frac{3}{\ee} \approx 1.1036
\end{aligned} b 0 b 1 = 2 1 ∫ − 1 1 e x d x = sinh ( 1 ) ≈ 1.1752 = 2 3 ∫ − 1 1 x e x d x = e 3 ≈ 1.1036 so that
r 1 ∗ ( x ) = 1.1752 + 1.1036 x , ∥ f − r 1 ∗ ∥ ∞ ≈ 0.44 r_1^*(x) = 1.1752 + 1.1036 x, \qquad \norm{f - r_1^*}_\infty \approx 0.44 r 1 ∗ ( x ) = 1.1752 + 1.1036 x , ∥ f − r 1 ∗ ∥ ∞ ≈ 0.44 Compare this with
minimax: ∥ f − p 1 ∗ ∥ ∞ ≈ 0.2788 Taylor: ∥ f − p 1 ∥ ∞ ≈ 0.7182 \begin{align}
\textrm{minimax:} & \qquad \norm{f - p_1^*}_\infty \approx 0.2788 \\
\textrm{Taylor:} & \qquad \norm{f - p_1}_\infty \approx 0.7182
\end{align} minimax: Taylor: ∥ f − p 1 ∗ ∥ ∞ ≈ 0.2788 ∥ f − p 1 ∥ ∞ ≈ 0.7182 We have the ordering of error in max norm
Error of minimax < Error of least squares < Error of Taylor \textrm{Error of minimax $\lt$ Error of least squares $\lt$ Error of Taylor} Error of minimax < Error of least squares < Error of Taylor Example 2 (Cubic least squares)
It is given by
r 3 ∗ ( x ) = 0.996294 + 0.997955 x + 0.536722 x 2 + 0.176139 x 3 r_3^*(x) = 0.996294 + 0.997955 x + 0.536722 x^2 + 0.176139 x^3 r 3 ∗ ( x ) = 0.996294 + 0.997955 x + 0.536722 x 2 + 0.176139 x 3 r3 = lambda x: 0.996294 + 0.997955*x + 0.536722*x**2 + 0.176139*x**3
x = linspace(-1,1,100)
plot(x,exp(x)-r3(x))
title('Error of cubic least squares approximation of exp(x)')
grid(True), xlabel('x');
and the error oscillates in sign but the error is somewhat larger at the end points, with
∥ f − r 3 ∗ ∥ ∞ ≈ 0.0112 \norm{f - r_3^*}_\infty \approx 0.0112 ∥ f − r 3 ∗ ∥ ∞ ≈ 0.0112 This error is similar to error in cubic interpolating polynomial.
7.6.2 General least squares problem ¶ Let there be given a weight function
w ( x ) ≥ 0 , x ∈ [ a , b ] w(x) \ge 0, \qquad x \in [a,b] w ( x ) ≥ 0 , x ∈ [ a , b ] such that
∫ a b ∣ x ∣ n w ( x ) d x < ∞ \int_a^b|x|^n w(x) \ud x < \infty ∫ a b ∣ x ∣ n w ( x ) d x < ∞ , ∀ n ≥ 0 \forall n \ge 0 ∀ n ≥ 0
If ∫ a b w ( x ) g ( x ) d x = 0 \int_a^b w(x) g(x) \ud x = 0 ∫ a b w ( x ) g ( x ) d x = 0 for some non-negative continuous
function g g g ⟹ \implies ⟹ g ( x ) ≡ 0 g(x) \equiv 0 g ( x ) ≡ 0 on ( a , b ) (a,b) ( a , b ) .
Examples of commonly used weight functions are
w ( x ) = 1 , a ≤ x ≤ b w ( x ) = 1 1 − x 2 , − 1 ≤ x ≤ + 1 w ( x ) = e − x , 0 ≤ x ≤ ∞ w ( x ) = e − x 2 , − ∞ < x < ∞ \begin{aligned}
w(x) &= 1, \qquad\qquad a \le x \le b \\
w(x) &= \frac{1}{\sqrt{1-x^2}}, \quad -1 \le x \le +1 \\
w(x) &= \ee^{-x}, \qquad\quad 0 \le x \le \infty \\
w(x) &= \ee^{-x^2}, \quad\quad -\infty < x < \infty
\end{aligned} w ( x ) w ( x ) w ( x ) w ( x ) = 1 , a ≤ x ≤ b = 1 − x 2 1 , − 1 ≤ x ≤ + 1 = e − x , 0 ≤ x ≤ ∞ = e − x 2 , − ∞ < x < ∞ 7.6.2.1 Least squares problem ¶ Given f ∈ C [ a , b ] f \in \cts[a,b] f ∈ C [ a , b ] , find the polynomial r n ∗ ∈ P n r_n^* \in \poly_n r n ∗ ∈ P n which
minimizes
∫ a b w ( x ) [ f ( x ) − r ( x ) ] 2 d x \int_a^b w(x)[f(x) - r(x)]^2 \ud x ∫ a b w ( x ) [ f ( x ) − r ( x ) ] 2 d x among all polynomials r ∈ P n r \in \poly_n r ∈ P n .
Does it exist ? Is it unique ? How to find it ? 7.6.3 Least squares using monomials ¶ Find the coefficients in the polynomial
r n ( x ) = ∑ j = 0 n a j x j , x ∈ [ a , b ] = [ 0 , 1 ] r_n(x) = \sum_{j=0}^n a_j x^j, \qquad x \in [a,b] = [0,1] r n ( x ) = j = 0 ∑ n a j x j , x ∈ [ a , b ] = [ 0 , 1 ] so that, with weight function w ( x ) ≡ 1 w(x) \equiv 1 w ( x ) ≡ 1
F ( a 0 , a 1 , … , a n ) : = ∫ 0 1 [ f ( x ) − r n ( x ) ] 2 d x F(a_0,a_1,\ldots,a_n) := \int_0^1 [f(x) - r_n(x)]^2 \ud x F ( a 0 , a 1 , … , a n ) := ∫ 0 1 [ f ( x ) − r n ( x ) ] 2 d x is minimized. The necessary optimality conditions are
∂ F ∂ a i = 0 ⟹ ∑ j = 0 n a j ∫ 0 1 x i + j d x = ∫ 0 1 f ( x ) x i d x , i = 0 , 1 , … , n \df{F}{a_i} = 0 \quad\implies\quad \sum_{j=0}^n a_j \int_0^1 x^{i+j} \ud x = \int_0^1 f(x) x^i \ud x, \qquad i=0,1,\ldots,n ∂ a i ∂ F = 0 ⟹ j = 0 ∑ n a j ∫ 0 1 x i + j d x = ∫ 0 1 f ( x ) x i d x , i = 0 , 1 , … , n This is a set of n + 1 n+1 n + 1 coupled linear equations
A a = b , A i j = 1 i + j + 1 , 0 ≤ i , j ≤ n Aa = b, \qquad A_{ij} = \frac{1}{i+j+1}, \qquad 0 \le i,j \le n A a = b , A ij = i + j + 1 1 , 0 ≤ i , j ≤ n A A A is called the Hilbert matrix and it is highly ill conditioned
cond ( A ) = O ( ( 1 + 2 ) 4 n + 4 n + 1 ) \cond(A) = \order{ \frac{(1 + \sqrt{2})^{4n+4}}{\sqrt{n+1}} } cond ( A ) = O ( n + 1 ( 1 + 2 ) 4 n + 4 ) The monomial basis 1 , x , x 2 , … , x n 1,x,x^2,\ldots,x^n 1 , x , x 2 , … , x n is linearly independent but for large powers m , n m,n m , n , the functions x m , x n x^m, x^n x m , x n are nearly identical which causes the bad condition numbers. Note that we can write the matrix elements are
A i j = ∫ 0 1 ϕ i ( x ) ϕ j ( x ) d x , ϕ i ( x ) = x i A_{ij} = \int_0^1 \phi_i(x) \phi_j(x) \ud x, \qquad \phi_i(x) = x^i A ij = ∫ 0 1 ϕ i ( x ) ϕ j ( x ) d x , ϕ i ( x ) = x i If we could make the matrix diagonal, i.e.
A i j = 0 , i ≠ j A_{ij} = 0, \qquad i \ne j A ij = 0 , i = j then its solution would become trivial and we do not to worry about ill-conditioning. We should use a set of basis functions which has the above property, instead of using the monomials.
7.6.4 Orthogonal polynomials ¶ Let w : ( a , b ) → ( 0 , ∞ ) w : (a,b) \to (0,\infty) w : ( a , b ) → ( 0 , ∞ ) be a weight function and define the inner
product
( f , g ) = ∫ a b w ( x ) f ( x ) g ( x ) d x \ip{f,g} = \int_a^b w(x) f(x) g(x) \ud x ( f , g ) = ∫ a b w ( x ) f ( x ) g ( x ) d x The inner product satisfies these properties.
( α f , g ) = ( f , α g ) = α ( f , g ) \ip{\alpha f, g} = \ip{f,\alpha g} = \alpha \ip{f,g} ( α f , g ) = ( f , αg ) = α ( f , g ) for all
α ∈ R \alpha \in \re α ∈ R
( f 1 + f 2 , g ) = ( f 1 , g ) + ( f 2 , g ) \ip{f_1 + f_2, g} = \ip{f_1,g} + \ip{f_2,g} ( f 1 + f 2 , g ) = ( f 1 , g ) + ( f 2 , g ) and
( f , g 1 + g 2 ) = ( f , g 1 ) + ( f , g 2 ) \ip{f,g_1+g_2} = \ip{f,g_1} + \ip{f,g_2} ( f , g 1 + g 2 ) = ( f , g 1 ) + ( f , g 2 )
( f , g ) = ( g , f ) \ip{f,g} = \ip{g,f} ( f , g ) = ( g , f )
( f , f ) ≥ 0 \ip{f,f} \ge 0 ( f , f ) ≥ 0 for all f ∈ C [ a , b ] f \in \cts[a,b] f ∈ C [ a , b ] and ( f , f ) = 0 \ip{f,f} = 0 ( f , f ) = 0 iff
f = 0 f=0 f = 0
Such an inner product gives rise to a norm
∥ f ∥ = ( f , f ) = ∫ a b w ( x ) ∣ f ( x ) ∣ 2 d x \norm{f} = \sqrt{ \ip{f,f} } = \sqrt{ \int_a^b w(x) |f(x)|^2 \ud x } ∥ f ∥ = ( f , f ) = ∫ a b w ( x ) ∣ f ( x ) ∣ 2 d x Morever, we have the Cauchy-Schwarz inequality
∣ ( f , g ) ∣ ≤ ∥ f ∥ ⋅ ∥ g ∥ |\ip{f,g}| \le \norm{f} \cdot \norm{g} ∣ ( f , g ) ∣ ≤ ∥ f ∥ ⋅ ∥ g ∥ Definition 1 (Orthogonal functions)
Two functions f , g f,g f , g are said to be orthogonal with respect to the inner
product ( ⋅ , ⋅ ) \ip{\cdot,\cdot} ( ⋅ , ⋅ ) if
( f , g ) = 0 \ip{f,g} = 0 ( f , g ) = 0 Theorem 1 (Orthogonal polynomials)
There exists a sequence of polynomials { ϕ n : n ≥ 0 } \{ \phi_n : n \ge 0\} { ϕ n : n ≥ 0 } with degree( ϕ n ) = n (\phi_n) =n ( ϕ n ) = n and
( ϕ n , ϕ m ) = 0 n ≠ m \ip{\phi_n, \phi_m} = 0 \qquad n \ne m ( ϕ n , ϕ m ) = 0 n = m In addition, we can construct the sequence with the additional properties
( ϕ n , ϕ n ) = 1 \ip{\phi_n,\phi_n} = 1 ( ϕ n , ϕ n ) = 1 for all n n n .
The coefficient of x n x^n x n in ϕ n \phi_n ϕ n is positive.
then the sequence { ϕ n } \{ \phi_n \} { ϕ n } is unique.
The proof is constructive and uses the Gram-Schmidt orthogonalization process.
(1) Start with the constant polynomial ϕ 0 ( x ) = c = \phi_0(x) = c = ϕ 0 ( x ) = c = constant and normalize it
1 = ( ϕ 0 , ϕ 0 ) = c 2 ∫ a b w ( x ) d x 1 = \ip{\phi_0,\phi_0} = c^2 \int_a^b w(x) \ud x 1 = ( ϕ 0 , ϕ 0 ) = c 2 ∫ a b w ( x ) d x which fixes the constant
c = ( ∫ a b w ( x ) d x ) − 1 c = \left( \int_a^b w(x) \ud x \right)^{-1} c = ( ∫ a b w ( x ) d x ) − 1 (2) To construct ϕ 1 ( x ) \phi_1(x) ϕ 1 ( x ) , start with
ψ 1 ( x ) : = x + a 10 ϕ 0 ( x ) \psi_1(x) := x + a_{10} \phi_0(x) ψ 1 ( x ) := x + a 10 ϕ 0 ( x ) Make it orthogonal to ϕ 0 \phi_0 ϕ 0
( ψ 1 , ϕ 0 ) = 0 ⟹ ( x , ϕ 0 ) + a 10 ( ϕ 0 , ϕ 0 ) ⏟ = 1 = 0 ⟹ a 10 = − ( x , ϕ 0 ) \ip{\psi_1, \phi_0} = 0 \quad\implies\quad \ip{x,\phi_0} + a_{10} \underbrace{\ip{\phi_0,\phi_0}}_{=1} = 0 \quad\implies\quad a_{10} = -\ip{x,\phi_0} ( ψ 1 , ϕ 0 ) = 0 ⟹ ( x , ϕ 0 ) + a 10 = 1 ( ϕ 0 , ϕ 0 ) = 0 ⟹ a 10 = − ( x , ϕ 0 ) Now normalize by defining the next function
ϕ 1 ( x ) : = 1 ∥ ψ 1 ∥ ψ 1 ( x ) ⟹ ( ϕ 1 , ϕ 1 ) = 1 \phi_1(x) := \frac{1}{{\norm{\psi_1}}} \psi_1(x) \quad\implies\quad \ip{\phi_1,\phi_1} = 1 ϕ 1 ( x ) := ∥ ψ 1 ∥ 1 ψ 1 ( x ) ⟹ ( ϕ 1 , ϕ 1 ) = 1 (3) Suppose ϕ 0 , ϕ 1 , … , ϕ n − 1 \phi_0, \phi_1, \ldots, \phi_{n-1} ϕ 0 , ϕ 1 , … , ϕ n − 1 have been found. For the next function which is of degree n n n , define
ψ n ( x ) = x n + a n , n − 1 ϕ n − 1 ( x ) + … + a n , 0 ϕ 0 ( x ) \psi_n(x) = x^n + a_{n,n-1} \phi_{n-1}(x) + \ldots + a_{n,0} \phi_0(x) ψ n ( x ) = x n + a n , n − 1 ϕ n − 1 ( x ) + … + a n , 0 ϕ 0 ( x ) Determine the coefficients by making ψ n \psi_n ψ n orthogonal to all the ϕ 0 , ϕ 1 , … , ϕ n − 1 \phi_0, \phi_1, \ldots, \phi_{n-1} ϕ 0 , ϕ 1 , … , ϕ n − 1 . For j = 0 , 1 , … , n − 1 j=0,1,\ldots,n-1 j = 0 , 1 , … , n − 1
( ψ n , ϕ j ) = 0 ⟹ ( x n , ϕ j ) + a n , j ( ϕ j , ϕ j ) ⏟ = 1 = 0 ⟹ a n , j = − ( x n , ϕ j ) ϕ n ( x ) = 1 ∥ ψ n ∥ ψ n ( x ) ⟹ ( ϕ n , ϕ n ) = 1 \begin{align}
\ip{\psi_n,\phi_j} = 0 \quad\implies & \quad \ip{x^n,\phi_j} + a_{n,j} \underbrace{\ip{\phi_j,\phi_j}}_{=1} = 0 \\
\quad\implies & \quad a_{n,j} = -\ip{x^n, \phi_j} \\
\phi_n(x) = \frac{1}{\norm{\psi_n}} \psi_n(x) \quad\implies & \quad \ip{\phi_n,\phi_n} = 1
\end{align} ( ψ n , ϕ j ) = 0 ⟹ ⟹ ϕ n ( x ) = ∥ ψ n ∥ 1 ψ n ( x ) ⟹ ( x n , ϕ j ) + a n , j = 1 ( ϕ j , ϕ j ) = 0 a n , j = − ( x n , ϕ j ) ( ϕ n , ϕ n ) = 1 Lemma 1 (Orthonormal basis for polynomials)
(1) The orthonoral set of polynomials
V n = { ϕ 0 , ϕ 1 , … , ϕ n } V_n = \{\phi_0, \phi_1, \ldots, \phi_n\} V n = { ϕ 0 , ϕ 1 , … , ϕ n } is linearly independent and forms a basis for P n \poly_n P n .
(2) Any polynomial p ∈ P n p \in \poly_n p ∈ P n can be written as
p = ∑ j = 0 n a j ϕ j , a j = ( p , ϕ j ) p = \sum_{j=0}^n a_j \phi_j, \qquad a_j = \ip{p,\phi_j} p = j = 0 ∑ n a j ϕ j , a j = ( p , ϕ j ) (1) Let
∑ j = 0 n c j ϕ j ( x ) = 0 , x ∈ [ a , b ] \sum_{j=0}^n c_j \phi_j(x) = 0, \quad x \in [a,b] j = 0 ∑ n c j ϕ j ( x ) = 0 , x ∈ [ a , b ] Taking inner product with ϕ i \phi_i ϕ i , 0 ≤ i ≤ n 0 \le i \le n 0 ≤ i ≤ n and using the orthogonality
c i ( ϕ i , ϕ i ) = 0 ⟹ c i = 0 c_i \ip{\phi_i, \phi_i} = 0 \quad\implies\quad c_i = 0 c i ( ϕ i , ϕ i ) = 0 ⟹ c i = 0 This proves the linear independence of V n V_n V n .
(2) In the proof of Gram-Schmidt theorem, we had the intermediate result
x j = ∥ ψ j ∥ ϕ j ( x ) − ( a j , j − 1 ϕ j − 1 ( x ) + … + a j , 0 ϕ 0 ( x ) ) , 1 ≤ j ≤ n x^j = \norm{\psi_j} \phi_j(x) - (a_{j,j-1} \phi_{j-1}(x) + \ldots + a_{j,0} \phi_0(x)), \quad 1 \le j \le n x j = ∥ ψ j ∥ ϕ j ( x ) − ( a j , j − 1 ϕ j − 1 ( x ) + … + a j , 0 ϕ 0 ( x )) , 1 ≤ j ≤ n Hence any p ∈ P n p \in \poly_n p ∈ P n can be written as a linear combination of V n V_n V n .
Alternate proof.
P n \poly_n P n is of dimension n + 1 n+1 n + 1 and the set V n V_n V n has n + 1 n+1 n + 1 linearly independent elements of P n \poly_n P n . Hence they form a basis for P n \poly_n P n .
(3) Any p ∈ P n p \in \poly_n p ∈ P n can be written as
p = ∑ j = 0 n a j ϕ j , a j = ( p , ϕ j ) p = \sum_{j=0}^n a_j \phi_j, \qquad a_j = \ip{p,\phi_j} p = j = 0 ∑ n a j ϕ j , a j = ( p , ϕ j ) and taking inner product with ϕ i \phi_i ϕ i yields a i = ( p , ϕ i ) a_i = \ip{p,\phi_i} a i = ( p , ϕ i ) .
Let { ϕ j , j = 0 , 1 , … } \{\phi_j, j=0,1,\ldots\} { ϕ j , j = 0 , 1 , … } be an orthogonal family of polynomials with degree ϕ j = j \phi_j=j ϕ j = j . For any p ∈ P n p \in \poly_n p ∈ P n
( p , ϕ m ) = 0 , m = n + 1 , n + 2 , … \ip{p,\phi_m} = 0, \qquad m=n+1,n+2,\ldots ( p , ϕ m ) = 0 , m = n + 1 , n + 2 , … 7.6.5 Legendre polynomials ¶ Take [ a , b ] = [ − 1 , 1 ] [a,b] = [-1,1] [ a , b ] = [ − 1 , 1 ] and weight function
w ( x ) ≡ 1 w(x) \equiv 1 w ( x ) ≡ 1 The resulting orthogonal sequence are called the Legendre polynomials.
ϕ 0 ( x ) = 1 2 , ϕ 1 ( x ) = 3 2 x , ϕ 2 ( x ) = 1 2 5 2 ( 3 x 2 − 1 ) \phi_0(x) = \frac{1}{\sqrt{2}}, \quad \phi_1(x) = \sqrt{\frac{3}{2}} x, \quad \phi_2(x) = \half \sqrt{\frac{5}{2}} (3x^2 - 1) ϕ 0 ( x ) = 2 1 , ϕ 1 ( x ) = 2 3 x , ϕ 2 ( x ) = 2 1 2 5 ( 3 x 2 − 1 ) They are usually defined in terms of the Rodrigues Formula
P 0 ( x ) = 1 , P n ( x ) = ( − 1 ) n 2 n n ! d n d x n ( 1 − x 2 ) n , n ≥ 1 P_0(x)=1, \qquad P_n(x) = \frac{(-1)^n}{2^n n!} \dd{^n}{x^n}(1-x^2)^n, \qquad n \ge 1 P 0 ( x ) = 1 , P n ( x ) = 2 n n ! ( − 1 ) n d x n d n ( 1 − x 2 ) n , n ≥ 1 which are polynomial solutions of Legendre’s differential equation
d d x [ ( 1 − x 2 ) d P n d x ] + n ( n + 1 ) P n = 0 , P n ( 1 ) = 1 \dd{}{x}\left[(1-x^2) \dd{P_n}{x}\right] + n(n+1)P_n = 0, \qquad P_n(1) = 1 d x d [ ( 1 − x 2 ) d x d P n ] + n ( n + 1 ) P n = 0 , P n ( 1 ) = 1 The P n P_n P n satisfy the orthogonality conditions
( P n , P n ) = 2 2 n + 1 , ( P n , P m ) = 0 , n ≠ m \ip{P_n,P_n} = \frac{2}{2n+1}, \qquad \ip{P_n,P_m} = 0, \quad n \ne m ( P n , P n ) = 2 n + 1 2 , ( P n , P m ) = 0 , n = m and the recurrence relation
P n + 1 ( x ) = 2 n + 1 n + 1 x P n ( x ) − n n + 1 P n − 1 ( x ) P_{n+1}(x) = \frac{2n+1}{n+1} xP_n(x) - \frac{n}{n+1} P_{n-1}(x) P n + 1 ( x ) = n + 1 2 n + 1 x P n ( x ) − n + 1 n P n − 1 ( x ) The orthonormal functions are
ϕ n ( x ) = 2 n + 1 2 P n ( x ) \phi_n(x) = \sqrt{\frac{2n+1}{2}} P_n(x) ϕ n ( x ) = 2 2 n + 1 P n ( x ) 7.6.6 Chebyshev polynomials ¶ Take [ a , b ] = [ − 1 , 1 ] [a,b]=[-1,1] [ a , b ] = [ − 1 , 1 ] and weight function
w ( x ) = 1 1 − x 2 w(x) = \frac{1}{\sqrt{1-x^2}} w ( x ) = 1 − x 2 1 The resulting sequence of orthogonal polynomials can be written in terms of the Chebyshev polynomials
T 0 ( x ) = 1 T 1 ( x ) = x T n + 1 ( x ) = 2 x T n ( x ) − T n − 1 ( x ) , n ≥ 1 \begin{align}
T_0(x) &= 1 \\
T_1(x) &= x \\
T_{n+1}(x) &= 2 x T_n(x) - T_{n-1}(x), \quad n\ge 1
\end{align} T 0 ( x ) T 1 ( x ) T n + 1 ( x ) = 1 = x = 2 x T n ( x ) − T n − 1 ( x ) , n ≥ 1 or
T n ( x ) = cos ( n cos − 1 x ) T_n(x) = \cos(n \cos^{-1}x) T n ( x ) = cos ( n cos − 1 x ) They are orthogonal wrt to the above weight
( T n , T m ) = { 0 n ≠ m π m = n = 0 1 2 π m = n > 0 \ip{T_n, T_m} = \begin{cases}
0 & n \ne m\\
\pi & m = n = 0 \\
\half\pi & m = n > 0
\end{cases} ( T n , T m ) = ⎩ ⎨ ⎧ 0 π 2 1 π n = m m = n = 0 m = n > 0 Hence the orthonormal sequence is
ϕ 0 ( x ) = 1 π , ϕ n ( x ) = 2 π T n ( x ) , n ≥ 1 \phi_0(x) = \frac{1}{\sqrt{\pi}}, \qquad \phi_n(x) = \sqrt{\frac{2}{\pi}} T_n(x), \quad n \ge 1 ϕ 0 ( x ) = π 1 , ϕ n ( x ) = π 2 T n ( x ) , n ≥ 1 7.6.7 Laguerre polynomials ¶ Take [ a , b ) = [ 0 , ∞ ) [a,b) = [0,\infty) [ a , b ) = [ 0 , ∞ ) and weight function
w ( x ) = e − x , x ∈ [ 0 , ∞ ) w(x) = \ee^{-x}, \qquad x \in [0,\infty) w ( x ) = e − x , x ∈ [ 0 , ∞ ) The resulting sequence of orthogonal polynomials are called Laguerre polynomials and are usually written as
L n ( x ) = e x n ! d n d x n ( x n e − x ) , n ≥ 0 L_n(x) = \frac{\ee^x}{n!} \dd{^n}{x^n}(x^n \ee^{-x}), \qquad n \ge 0 L n ( x ) = n ! e x d x n d n ( x n e − x ) , n ≥ 0 which are orthonormal and hence ϕ n ( x ) = L n ( x ) \phi_n(x) = L_n(x) ϕ n ( x ) = L n ( x ) . Moreover they obey the recurrence relation
L n + 1 ( x ) = 2 n + 1 − x n + 1 L n ( x ) − n n + 1 L n − 1 ( x ) L_{n+1}(x) = \frac{2n + 1 -x}{n+1} L_n(x) - \frac{n}{n+1} L_{n-1}(x) L n + 1 ( x ) = n + 1 2 n + 1 − x L n ( x ) − n + 1 n L n − 1 ( x ) 7.6.8 Least squares approximation ¶ Recall that we are trying to minimize ∥ f − r ∥ \norm{f - r} ∥ f − r ∥ wrt r ∈ P n r \in \poly_n r ∈ P n , where the norm comes from a weighted inner product. The solution of this problem will be simplified if we express r r r in terms of the orthonormal polynomials { ϕ j } \{ \phi_j \} { ϕ j } corresponding to the chosen inner product. So let
r ( x ) = ∑ j = 0 n b j ϕ j ( x ) r(x) = \sum_{j=0}^n b_j \phi_j(x) r ( x ) = j = 0 ∑ n b j ϕ j ( x ) Then for a given f ∈ C [ a , b ] f \in \cts[a,b] f ∈ C [ a , b ]
∥ f − r ∥ 2 = ∫ a b w ( x ) [ f ( x ) − ∑ j = 0 n b j ϕ j ( x ) ] 2 d x = : G ( b 0 , b 1 , … , b n ) \norm{f-r}^2 = \int_a^b w(x) \left[ f(x) - \sum_{j=0}^n b_j \phi_j(x) \right]^2 \ud x =: G(b_0, b_1, \ldots,b_n) ∥ f − r ∥ 2 = ∫ a b w ( x ) [ f ( x ) − j = 0 ∑ n b j ϕ j ( x ) ] 2 d x =: G ( b 0 , b 1 , … , b n ) is to be minimized wrt the parameters b 0 , b 1 , … , b n b_0, b_1, \ldots, b_n b 0 , b 1 , … , b n . Now
0 ≤ G ( b 0 , b 1 , … , b n ) = ( f − ∑ j = 0 n b j ϕ j , f − ∑ i = 0 n b i ϕ i ) = ( f , f ) − 2 ∑ j = 0 n b j ( f , ϕ j ) + ∑ i = 0 n ∑ j = 0 n b i b j ( ϕ i , ϕ j ) ⏟ δ i j = ∥ f ∥ 2 − 2 ∑ j = 0 n b j ( f , ϕ j ) + ∑ j = 0 n b j 2 = ∥ f ∥ 2 − ∑ j = 0 n ( f , ϕ j ) 2 + ∑ j = 0 n [ ( f , ϕ j ) − b j ] 2 \begin{aligned}
0
&\le G(b_0, b_1, \ldots, b_n) \\
&= \ip{ f - \sum_{j=0}^n b_j\phi_j, f - \sum_{i=0}^n b_i \phi_i} \\
&= \ip{f,f} - 2 \sum_{j=0}^n b_j \ip{f,\phi_j} + \sum_{i=0}^n \sum_{j=0}^n b_i b_j \underbrace{\ip{\phi_i, \phi_j}}_{\delta_{ij}} \\
&= \norm{f}^2 - 2 \sum_{j=0}^n b_j \ip{f,\phi_j} + \sum_{j=0}^n b_j^2 \\
&= \norm{f}^2 - \sum_{j=0}^n \ip{f,\phi_j}^2 + \sum_{j=0}^n [\ip{f,\phi_j} - b_j]^2
\end{aligned} 0 ≤ G ( b 0 , b 1 , … , b n ) = ( f − j = 0 ∑ n b j ϕ j , f − i = 0 ∑ n b i ϕ i ) = ( f , f ) − 2 j = 0 ∑ n b j ( f , ϕ j ) + i = 0 ∑ n j = 0 ∑ n b i b j δ ij ( ϕ i , ϕ j ) = ∥ f ∥ 2 − 2 j = 0 ∑ n b j ( f , ϕ j ) + j = 0 ∑ n b j 2 = ∥ f ∥ 2 − j = 0 ∑ n ( f , ϕ j ) 2 + j = 0 ∑ n [ ( f , ϕ j ) − b j ] 2 The first two terms do not depend on the { b j } \{b_j\} { b j } and the third term is non-negative. Hence G G G is minimized iff the third term is zero which implies that
b j = ( f , ϕ j ) , j = 0 , 1 , … , n b_j = \ip{f,\phi_j}, \qquad j=0,1,\ldots,n b j = ( f , ϕ j ) , j = 0 , 1 , … , n is the unique solution of the minimization problem. Hence the least squares approximation exists, is unique and given by
r n ∗ ( x ) = ∑ j = 0 n ( f , ϕ j ) ϕ j ( x ) r_n^*(x) = \sum_{j=0}^n \ip{f,\phi_j} \phi_j(x) r n ∗ ( x ) = j = 0 ∑ n ( f , ϕ j ) ϕ j ( x ) Note that we make use of the inner product structure of the norm in this proof. This is a special version of the projection theorem in Hilbert spaces, see e.g. (Brezis, 2010 , Theorem 5.2, Corollary 5.4) or (Kreyszig, 1978 , Section 3.3-1 and 3.3-2).
We can also find the minimizer using Calculus. The necessary condition for an extremum is
∂ G ∂ b i = − 2 ∫ a b w ( x ) [ f ( x ) − ∑ j = 0 n b j ϕ j ( x ) ] ϕ i ( x ) d x = 0 \df{G}{b_i} =-2 \int_a^b w(x) \left[ f(x) - \sum_{j=0}^n b_j \phi_j(x) \right] \phi_i(x) \ud x = 0 ∂ b i ∂ G = − 2 ∫ a b w ( x ) [ f ( x ) − j = 0 ∑ n b j ϕ j ( x ) ] ϕ i ( x ) d x = 0 Using orthonormality, we get the unique solution of the optimality conditions as
∫ a b w ( x ) f ( x ) ϕ i ( x ) d x = ∑ j = 0 n b j ∫ a b w ( x ) ϕ i ( x ) ϕ j ( x ) d x = b i ( f , ϕ i ) = b i \begin{align}
\int_a^b w(x) f(x) \phi_i(x) \ud x &= \sum_{j=0}^n b_j \int_a^b w(x) \phi_i(x) \phi_j(x) \ud x = b_i \\
\ip{f, \phi_i} &= b_i
\end{align} ∫ a b w ( x ) f ( x ) ϕ i ( x ) d x ( f , ϕ i ) = j = 0 ∑ n b j ∫ a b w ( x ) ϕ i ( x ) ϕ j ( x ) d x = b i = b i The Hessian
∂ 2 G ∂ b i ∂ b j = 2 ∫ a b w ( x ) ϕ i ( x ) ϕ j ( x ) d x = 2 δ i j \df{^2 G}{b_i \partial b_j} = 2 \int_a^b w(x) \phi_i(x) \phi_j(x) \ud x = 2 \delta_{ij} ∂ b i ∂ b j ∂ 2 G = 2 ∫ a b w ( x ) ϕ i ( x ) ϕ j ( x ) d x = 2 δ ij is positive definite, so the extremum is the unique minimum.
As n → ∞ n \to \infty n → ∞ the best approximations in 2-norm converge to the function.
Assume [ a , b ] [a,b] [ a , b ] is finite, f ∈ C [ a , b ] f \in \cts[a,b] f ∈ C [ a , b ] . Then
lim n → ∞ ∥ f − r n ∗ ∥ = 0 \lim_{n \to \infty} \norm{f - r_n^*} = 0 n → ∞ lim ∥ f − r n ∗ ∥ = 0 (1) Since r n ∗ r_n^* r n ∗ is the minimizer
∥ f − r n ∗ ∥ = min r ∈ P n ∥ f − r ∥ \norm{f - r_n^*} = \min_{r \in \poly_n} \norm{f - r} ∥ f − r n ∗ ∥ = r ∈ P n min ∥ f − r ∥ we have ∥ f − r n ∗ ∥ ≥ ∥ f − r n + 1 ∗ ∥ \norm{f - r_n^*} \ge \norm{f - r_{n+1}^*} ∥ f − r n ∗ ∥ ≥ ∥ f − r n + 1 ∗ ∥ since P n ⊂ P n + 1 \poly_n \subset \poly_{n+1} P n ⊂ P n + 1 . Hence ∥ f − r n ∗ ∥ \norm{f - r_n^*} ∥ f − r n ∗ ∥ is a non-increasing sequence. But does it decrease to zero ?
(2) Let ϵ > 0 \epsilon > 0 ϵ > 0 be arbitrary. By Weirstrass theorem, there is a polynomial Q m Q_m Q m of some degree m ≥ 0 m \ge 0 m ≥ 0 for which
max x ∈ [ a , b ] ∣ f ( x ) − Q m ( x ) ∣ ≤ ϵ c , c = ( ∫ a b w ( x ) d x ) 1 2 \max_{x \in [a,b]}|f(x) - Q_m(x)| \le \frac{\epsilon}{c}, \qquad c = \left( \int_a^b w(x) \ud x \right)^\half x ∈ [ a , b ] max ∣ f ( x ) − Q m ( x ) ∣ ≤ c ϵ , c = ( ∫ a b w ( x ) d x ) 2 1 By optimality of r m ∗ r_m^* r m ∗
∥ f − r m ∗ ∥ ≤ ∥ f − Q m ∥ = ( ∫ a b w ( x ) ∣ f ( x ) − Q m ( x ) ∣ 2 d x ) 1 2 ≤ ( ∫ a b w ( x ) ϵ 2 c 2 d x ) 1 2 = ϵ \begin{aligned}
\norm{f - r_m^*}
&\le \norm{f - Q_m} \\
&= \left( \int_a^b w(x)|f(x) - Q_m(x)|^2 \ud x \right)^\half \\
&\le \left( \int_a^b w(x) \frac{\epsilon^2}{c^2} \ud x \right)^\half \\
&= \epsilon
\end{aligned} ∥ f − r m ∗ ∥ ≤ ∥ f − Q m ∥ = ( ∫ a b w ( x ) ∣ f ( x ) − Q m ( x ) ∣ 2 d x ) 2 1 ≤ ( ∫ a b w ( x ) c 2 ϵ 2 d x ) 2 1 = ϵ Hence, from (1)
∥ f − r n ∗ ∥ ≤ ϵ , ∀ n ≥ m \norm{f - r_n^*} \le \epsilon, \qquad \forall n \ge m ∥ f − r n ∗ ∥ ≤ ϵ , ∀ n ≥ m Since ε was arbitrary this proves the theorem.
We proved convergence in L 2 L^2 L 2 norm but this does not imply that ∥ f − r n ∗ ∥ ∞ → 0 \norm{f - r_n^*}_\infty \to 0 ∥ f − r n ∗ ∥ ∞ → 0 . But if additional assumptions on differentiability of f f f are made, then we can prove that r n ∗ r_n^* r n ∗ converges pointwise to f f f . Also the requirement of continuity is also needed for convergence in 2-norm, see next section.
7.6.9 Chebyshev least squares approximation ¶ The best approximation in this basis is given by
C n ( x ) = ∑ j = 0 n ′ a j T j ( x ) , a j = 2 π ∫ − 1 1 f ( x ) T j ( x ) 1 − x 2 d x C_n(x) = \sum_{j=0}^n{}' a_j T_j(x), \qquad a_j = \frac{2}{\pi} \int_{-1}^1 \frac{f(x) T_j(x)}{\sqrt{1-x^2}} \ud x C n ( x ) = j = 0 ∑ n ′ a j T j ( x ) , a j = π 2 ∫ − 1 1 1 − x 2 f ( x ) T j ( x ) d x where the prime indicates that the first term must be multiplied by 1 2 \half 2 1 . Let us make the change of variable
x = cos θ , θ ∈ [ 0 , π ] x = \cos\theta, \qquad \theta \in [0,\pi] x = cos θ , θ ∈ [ 0 , π ] Then T j ( x ) = cos ( j cos − 1 x ) = cos ( j θ ) T_j(x) = \cos(j\cos^{-1}x) = \cos(j\theta) T j ( x ) = cos ( j cos − 1 x ) = cos ( j θ ) and
C n ( cos θ ) = ∑ j = 0 n ′ a j cos ( j θ ) , a j = 2 π ∫ 0 π cos ( j θ ) f ( cos θ ) d θ C_n(\cos\theta) = \sum_{j=0}^n{}' a_j \cos(j\theta), \qquad a_j = \frac{2}{\pi} \int_0^\pi \cos(j\theta) f(\cos\theta) \ud\theta C n ( cos θ ) = j = 0 ∑ n ′ a j cos ( j θ ) , a j = π 2 ∫ 0 π cos ( j θ ) f ( cos θ ) d θ Define
F ( θ ) = f ( cos θ ) , θ ∈ [ 0 , π ] F(\theta) = f(\cos\theta), \qquad \theta \in [0,\pi] F ( θ ) = f ( cos θ ) , θ ∈ [ 0 , π ] and extend it to [ − π , 0 ] [-\pi,0] [ − π , 0 ] as an even function
F ( θ ) = f ( cos θ ) , θ ∈ [ − π , π ] F(\theta) = f(\cos\theta), \qquad \theta \in [-\pi,\pi] F ( θ ) = f ( cos θ ) , θ ∈ [ − π , π ] The Chebyshev series becomes a Fourier cosine series, compare to (8) , (2) ,
S n F ( θ ) = ∑ j = 0 n ′ a j cos ( j θ ) , a j = 1 π ∫ − π π cos ( j θ ) F ( θ ) d θ S_n F(\theta) = \sum_{j=0}^n{}' a_j \cos(j\theta), \qquad a_j = \frac{1}{\pi} \int_{-\pi}^\pi \cos(j\theta) F(\theta) \ud\theta S n F ( θ ) = j = 0 ∑ n ′ a j cos ( j θ ) , a j = π 1 ∫ − π π cos ( j θ ) F ( θ ) d θ Since F ( θ ) F(\theta) F ( θ ) is even, the sine terms vanish, and this is also the full Fourier series.
Now if f ( x ) f(x) f ( x ) is piecewise continuous, so is F ( θ ) F(\theta) F ( θ ) , and we know that
∣ a j ∣ = O ( 1 j ) , j → ∞ |a_j| = \order{\frac{1}{j}}, \qquad j \to \infty ∣ a j ∣ = O ( j 1 ) , j → ∞ In this case, we have ∥ f − C n ∥ 2 → 0 \norm{f - C_n}_2 \to 0 ∥ f − C n ∥ 2 → 0 .
If f ∈ C ν − 1 [ − 1 , 1 ] f \in \cts^{\nu-1}[-1,1] f ∈ C ν − 1 [ − 1 , 1 ] for some ν ≥ 1 \nu \ge 1 ν ≥ 1 and f ( ν ) f^{(\nu)} f ( ν ) is piecewise continuous, then F ∈ C p ν − 1 [ − π , π ] F \in \cts^{\nu-1}_p[-\pi,\pi] F ∈ C p ν − 1 [ − π , π ] and F ( ν ) F^{(\nu)} F ( ν ) is piecewise continuous. This means that
∣ a j ∣ = O ( 1 j ν + 1 ) , j → ∞ |a_j| = \order{\frac{1}{j^{\nu+1}}}, \qquad j \to \infty ∣ a j ∣ = O ( j ν + 1 1 ) , j → ∞ In this case, we also get ∥ f − C n ∥ ∞ → ∞ \norm{f - C_n}_\infty \to \infty ∥ f − C n ∥ ∞ → ∞ .
Note that we do not require f ( s ) ( − 1 ) = f ( s ) ( 1 ) f^{(s)}(-1) = f^{(s)}(1) f ( s ) ( − 1 ) = f ( s ) ( 1 ) ; because of the way F ( θ ) F(\theta) F ( θ ) is constructed, it satisfies F ( s ) ( − π ) = F ( s ) ( π ) F^{(s)}(-\pi) = F^{(s)}(\pi) F ( s ) ( − π ) = F ( s ) ( π ) if f ∈ C s [ − 1 , 1 ] f \in \cts^s[-1,1] f ∈ C s [ − 1 , 1 ] .
7.6.10 Chebyshev least squares and minimax ¶ Suppose f f f is very smooth, say f ( ν ) f^{(\nu)} f ( ν ) of bounded variation for some ν ≫ 1 \nu \gg 1 ν ≫ 1 , then ∣ a j ∣ = O ( 1 / j ν + 1 ) |a_j| = \order{1/j^{\nu+1}} ∣ a j ∣ = O ( 1/ j ν + 1 ) and
f ( x ) − C n ( x ) = ∑ j = n + 1 ∞ a j T j ( x ) ≈ a n + 1 T n + 1 ( x ) f(x) - C_n(x) = \sum_{j=n+1}^\infty a_j T_j(x) \approx a_{n+1} T_{n+1}(x) f ( x ) − C n ( x ) = j = n + 1 ∑ ∞ a j T j ( x ) ≈ a n + 1 T n + 1 ( x ) Now
∣ T n + 1 ( x ) ∣ ≤ 1 , x ∈ [ − 1 , + 1 ] |T_{n+1}(x)| \le 1, \qquad x \in [-1,+1] ∣ T n + 1 ( x ) ∣ ≤ 1 , x ∈ [ − 1 , + 1 ] and at the n + 2 n+2 n + 2 points
x j = cos ( j π n + 1 ) , j = 0 , 1 , 2 , … , n + 1 x_j = \cos\left( \frac{j\pi}{n+1} \right), \quad j=0,1,2,\ldots,n+1 x j = cos ( n + 1 jπ ) , j = 0 , 1 , 2 , … , n + 1 it takes extreme values
T n + 1 ( x j ) = ( − 1 ) j T_{n+1}(x_j) = (-1)^j T n + 1 ( x j ) = ( − 1 ) j The error
f ( x j ) − C n ( x j ) ≈ a n + 1 ( − 1 ) j f(x_j) - C_n(x_j) \approx a_{n+1} (-1)^j f ( x j ) − C n ( x j ) ≈ a n + 1 ( − 1 ) j is approximately equi-oscillating at these n + 2 n+2 n + 2 points. This indicates that the Chebyshev least squares approximation can be expected to be close to the minimax
approximation, C n ≈ p n ⋆ C_n \approx p_n^\star C n ≈ p n ⋆ .
For the function f ( x ) = e x , x ∈ [ − 1 , 1 ] f(x) = \ee^x, \quad x \in [-1,1] f ( x ) = e x , x ∈ [ − 1 , 1 ]
f ( x ) − C 3 ( x ) ≈ a 4 T 4 ( x ) , a 4 = 0.00547 ∥ f − C 3 ∥ ∞ = 0.00607 E 3 ( f ) = ∥ f − p 3 ∗ ∥ ∞ = 0.00553 ≈ a 4 \begin{align}
f(x) - C_3(x) &\approx a_4 T_4(x), \qquad a_4= 0.00547 \\
\norm{f - C_3}_\infty &= 0.00607 \\
E_3(f) &= \norm{f - p_3^*}_\infty = 0.00553 \approx a_4
\end{align} f ( x ) − C 3 ( x ) ∥ f − C 3 ∥ ∞ E 3 ( f ) ≈ a 4 T 4 ( x ) , a 4 = 0.00547 = 0.00607 = ∥ f − p 3 ∗ ∥ ∞ = 0.00553 ≈ a 4 ∥ f − p n ∗ ∥ ∞ ≤ ∥ f − C n ∥ ∞ ≤ ( 4 + 4 π 2 log n ) ∥ f − p n ∗ ∥ ∞ \norm{f - p_n^*}_\infty \le \norm{f - C_n}_\infty \le \left( 4 + \frac{4}{\pi^2} \log n \right) \norm{f - p_n^*}_\infty ∥ f − p n ∗ ∥ ∞ ≤ ∥ f − C n ∥ ∞ ≤ ( 4 + π 2 4 log n ) ∥ f − p n ∗ ∥ ∞ 7.6.11 Chebyshev interpolation and minimax ¶ If f f f is very smooth, then
f ( x ) − C n ( x ) ≈ a n + 1 T n + 1 ( x ) f(x) - C_n(x) \approx a_{n+1} T_{n+1}(x) f ( x ) − C n ( x ) ≈ a n + 1 T n + 1 ( x ) The error at the n + 1 n+1 n + 1 roots of T n + 1 T_{n+1} T n + 1 given by (Chebyshev points of first kind)
x j = cos ( 2 j + 1 2 n + 2 π ) , j = 0 , 1 , 2 , … , n x_j = \cos\left( \frac{2j + 1}{2n + 2} \pi \right), \qquad j=0,1,2,\ldots,n x j = cos ( 2 n + 2 2 j + 1 π ) , j = 0 , 1 , 2 , … , n is nearly zero.
Let us construct the polynomial I n ( x ) I_n(x) I n ( x ) which interpolates f ( x ) f(x) f ( x ) at the n + 1 n+1 n + 1 roots of T n + 1 ( x ) T_{n+1}(x) T n + 1 ( x ) . Then
I n ( x ) = ∑ j = 0 n f ( x j ) ℓ j ( x ) = ∑ j = 0 n C n ( x j ) ℓ j ( x ) + ∑ j = 0 n [ f ( x j ) − C n ( x j ) ] ⏟ ≈ a n + 1 T n + 1 ( x j ) = 0 ℓ j ( x ) ≈ ∑ j = 0 n C n ( x j ) ℓ j ( x ) = C n ( x ) ≈ p n ∗ ( x ) \begin{aligned}
I_n(x)
&= \sum_{j=0}^n f(x_j) \ell_j(x) \\
&= \sum_{j=0}^n C_n(x_j) \ell_j(x) + \sum_{j=0}^n \underbrace{[f(x_j) - C_n(x_j)]}_{\approx a_{n+1}T_{n+1}(x_j) = 0} \ell_j(x) \\
&\approx \sum_{j=0}^n C_n(x_j) \ell_j(x) \\
&= C_n(x) \\
&\approx p_n^*(x)
\end{aligned} I n ( x ) = j = 0 ∑ n f ( x j ) ℓ j ( x ) = j = 0 ∑ n C n ( x j ) ℓ j ( x ) + j = 0 ∑ n ≈ a n + 1 T n + 1 ( x j ) = 0 [ f ( x j ) − C n ( x j )] ℓ j ( x ) ≈ j = 0 ∑ n C n ( x j ) ℓ j ( x ) = C n ( x ) ≈ p n ∗ ( x ) Thus the Chebyshev interpolation at first kind points can be expected to be close to the best approximation. We observed this in Example 2 . Recall from Section 6.2.9 that these Chebyshev nodes minimize the node polynomial appearing in polynomial interpolation error formula.
∥ f − I n ∥ ∞ ≤ ( 2 + 2 π log ( n + 1 ) ) ∥ f − p n ∗ ∥ ∞ \norm{f - I_n}_\infty \le \left( 2 + \frac{2}{\pi} \log(n+1) \right) \norm{f - p_n^*}_\infty ∥ f − I n ∥ ∞ ≤ ( 2 + π 2 log ( n + 1 ) ) ∥ f − p n ∗ ∥ ∞ For n = 100 n=100 n = 100
∥ f − I 100 ∥ ≤ 3.275 ∥ f − p n ∗ ∥ ∞ \norm{f - I_{100}} \le 3.275 \norm{f - p_n^*}_\infty ∥ f − I 100 ∥ ≤ 3.275∥ f − p n ∗ ∥ ∞ The Chebyshev-I interpolation error is at worst about three times larger than the best approximation error. But for a smooth function ∥ f − p n ∗ ∥ ∞ \norm{f - p_n^*}_\infty ∥ f − p n ∗ ∥ ∞ itself will be very small, in which case interpolation is also very accurate.
Clearly interpolation is very easy compared to the Remez algorithm for finding the best approximation.
Brezis, H. (2010). Functional Analysis, Sobolev Spaces and Partial Differential Equations . Springer New York. 10.1007/978-0-387-70914-7 Kreyszig, E. (1978). Introductory Functional Analysis With Applications . John Wiley & Sons, Inc.