Skip to article frontmatterSkip to article content

7.6Least squares approximation

#%config InlineBackend.figure_format = 'svg'
from pylab import *
from scipy.interpolate import barycentric_interpolate

7.6.1Best approximation in 2-norm

The minimax approximation is the best approximation since it minimizes the error at all points. This involves an optimization problem where we have to minimize the maximum norm of the error and the Remez algorithm is expensive. Optimization problems can be solved by gradient based methods but we cannot employ this due to the maximum norm which is not differentiable. To use gradient methods, let us change the norm to the L2L^2 norm

g2=(abg(x)2dx)2,gC[a,b]\norm{g}_2 = \left( \int_a^b |g(x)|^2 \ud x \right)^2, \qquad g \in \cts[a,b]

For a given fC[a,b]f \in \cts[a,b] and n0n \ge 0, define

Mn(f)=infrPnfr2M_n(f) = \inf_{r \in \poly_n} \norm{f - r}_2
  1. Is there a best approximating polynomial rnPnr_n^* \in \poly_n, i.e.,

    frn2=Mn(f)\norm{f - r_n^*}_2 = M_n(f)
  2. Is it unique ?

  3. How can we compute it ?

7.6.2General least squares problem

Let there be given a weight function

w(x)0,x[a,b]w(x) \ge 0, \qquad x \in [a,b]

such that

  1. abxnw(x)dx<\int_a^b|x|^n w(x) \ud x < \infty, n0\forall n \ge 0

  2. If abw(x)g(x)dx=0\int_a^b w(x) g(x) \ud x = 0 for some non-negative continuous function gg     \implies g(x)0g(x) \equiv 0 on (a,b)(a,b).

Examples of commonly used weight functions are

w(x)=1,axbw(x)=11x2,1x+1w(x)=ex,0xw(x)=ex2,<x<\begin{aligned} w(x) &= 1, \qquad\qquad a \le x \le b \\ w(x) &= \frac{1}{\sqrt{1-x^2}}, \quad -1 \le x \le +1 \\ w(x) &= \ee^{-x}, \qquad\quad 0 \le x \le \infty \\ w(x) &= \ee^{-x^2}, \quad\quad -\infty < x < \infty \end{aligned}

7.6.2.1Least squares problem

Given fC[a,b]f \in \cts[a,b], find the polynomial rnPnr_n^* \in \poly_n which minimizes

abw(x)[f(x)r(x)]2dx\int_a^b w(x)[f(x) - r(x)]^2 \ud x

among all polynomials rPnr \in \poly_n.

  1. Does it exist ?
  2. Is it unique ?
  3. How to find it ?

7.6.3Least squares using monomials

Find the coefficients in the polynomial

rn(x)=j=0najxj,x[a,b]=[0,1]r_n(x) = \sum_{j=0}^n a_j x^j, \qquad x \in [a,b] = [0,1]

so that, with weight function w(x)1w(x) \equiv 1

F(a0,a1,,an):=01[f(x)rn(x)]2dxF(a_0,a_1,\ldots,a_n) := \int_0^1 [f(x) - r_n(x)]^2 \ud x

is minimized. The necessary optimality conditions are

Fai=0    j=0naj01xi+jdx=01f(x)xidx,i=0,1,,n\df{F}{a_i} = 0 \quad\implies\quad \sum_{j=0}^n a_j \int_0^1 x^{i+j} \ud x = \int_0^1 f(x) x^i \ud x, \qquad i=0,1,\ldots,n

This is a set of n+1n+1 coupled linear equations

Aa=b,Aij=1i+j+1,0i,jnAa = b, \qquad A_{ij} = \frac{1}{i+j+1}, \qquad 0 \le i,j \le n

AA is called the Hilbert matrix and it is highly ill conditioned

cond(A)=O((1+2)4n+4n+1)\cond(A) = \order{ \frac{(1 + \sqrt{2})^{4n+4}}{\sqrt{n+1}} }

The monomial basis 1,x,x2,,xn1,x,x^2,\ldots,x^n is linearly independent but for large powers m,nm,n, the functions xm,xnx^m, x^n are nearly identical which causes the bad condition numbers. Note that we can write the matrix elements are

Aij=01ϕi(x)ϕj(x)dx,ϕi(x)=xiA_{ij} = \int_0^1 \phi_i(x) \phi_j(x) \ud x, \qquad \phi_i(x) = x^i

If we could make the matrix diagonal, i.e.

Aij=0,ijA_{ij} = 0, \qquad i \ne j

then its solution would become trivial and we do not to worry about ill-conditioning. We should use a set of basis functions which has the above property, instead of using the monomials.

7.6.4Orthogonal polynomials

Let w:(a,b)(0,)w : (a,b) \to (0,\infty) be a weight function and define the inner product

(f,g)=abw(x)f(x)g(x)dx\ip{f,g} = \int_a^b w(x) f(x) g(x) \ud x

The inner product satisfies these properties.

  1. (αf,g)=(f,αg)=α(f,g)\ip{\alpha f, g} = \ip{f,\alpha g} = \alpha \ip{f,g} for all αR\alpha \in \re

  2. (f1+f2,g)=(f1,g)+(f2,g)\ip{f_1 + f_2, g} = \ip{f_1,g} + \ip{f_2,g} and (f,g1+g2)=(f,g1)+(f,g2)\ip{f,g_1+g_2} = \ip{f,g_1} + \ip{f,g_2}

  3. (f,g)=(g,f)\ip{f,g} = \ip{g,f}

  4. (f,f)0\ip{f,f} \ge 0 for all fC[a,b]f \in \cts[a,b] and (f,f)=0\ip{f,f} = 0 iff f=0f=0

Such an inner product gives rise to a norm

f=(f,f)=abw(x)f(x)2dx\norm{f} = \sqrt{ \ip{f,f} } = \sqrt{ \int_a^b w(x) |f(x)|^2 \ud x }

Morever, we have the Cauchy-Schwarz inequality

(f,g)fg|\ip{f,g}| \le \norm{f} \cdot \norm{g}

7.6.5Legendre polynomials

Take [a,b]=[1,1][a,b] = [-1,1] and weight function

w(x)1w(x) \equiv 1

The resulting orthogonal sequence are called the Legendre polynomials.

ϕ0(x)=12,ϕ1(x)=32x,ϕ2(x)=1252(3x21)\phi_0(x) = \frac{1}{\sqrt{2}}, \quad \phi_1(x) = \sqrt{\frac{3}{2}} x, \quad \phi_2(x) = \half \sqrt{\frac{5}{2}} (3x^2 - 1)

They are usually defined in terms of the Rodrigues Formula

P0(x)=1,Pn(x)=(1)n2nn!dndxn(1x2)n,n1P_0(x)=1, \qquad P_n(x) = \frac{(-1)^n}{2^n n!} \dd{^n}{x^n}(1-x^2)^n, \qquad n \ge 1

which are polynomial solutions of Legendre’s differential equation

ddx[(1x2)dPndx]+n(n+1)Pn=0,Pn(1)=1\dd{}{x}\left[(1-x^2) \dd{P_n}{x}\right] + n(n+1)P_n = 0, \qquad P_n(1) = 1

The PnP_n satisfy the orthogonality conditions

(Pn,Pn)=22n+1,(Pn,Pm)=0,nm\ip{P_n,P_n} = \frac{2}{2n+1}, \qquad \ip{P_n,P_m} = 0, \quad n \ne m

and the recurrence relation

Pn+1(x)=2n+1n+1xPn(x)nn+1Pn1(x)P_{n+1}(x) = \frac{2n+1}{n+1} xP_n(x) - \frac{n}{n+1} P_{n-1}(x)

The orthonormal functions are

ϕn(x)=2n+12Pn(x)\phi_n(x) = \sqrt{\frac{2n+1}{2}} P_n(x)

7.6.6Chebyshev polynomials

Take [a,b]=[1,1][a,b]=[-1,1] and weight function

w(x)=11x2w(x) = \frac{1}{\sqrt{1-x^2}}

The resulting sequence of orthogonal polynomials can be written in terms of the Chebyshev polynomials

T0(x)=1T1(x)=xTn+1(x)=2xTn(x)Tn1(x),n1\begin{align} T_0(x) &= 1 \\ T_1(x) &= x \\ T_{n+1}(x) &= 2 x T_n(x) - T_{n-1}(x), \quad n\ge 1 \end{align}

or

Tn(x)=cos(ncos1x)T_n(x) = \cos(n \cos^{-1}x)

They are orthogonal wrt to the above weight

(Tn,Tm)={0nmπm=n=012πm=n>0\ip{T_n, T_m} = \begin{cases} 0 & n \ne m\\ \pi & m = n = 0 \\ \half\pi & m = n > 0 \end{cases}

Hence the orthonormal sequence is

ϕ0(x)=1π,ϕn(x)=2πTn(x),n1\phi_0(x) = \frac{1}{\sqrt{\pi}}, \qquad \phi_n(x) = \sqrt{\frac{2}{\pi}} T_n(x), \quad n \ge 1

7.6.7Laguerre polynomials

Take [a,b)=[0,)[a,b) = [0,\infty) and weight function

w(x)=ex,x[0,)w(x) = \ee^{-x}, \qquad x \in [0,\infty)

The resulting sequence of orthogonal polynomials are called Laguerre polynomials and are usually written as

Ln(x)=exn!dndxn(xnex),n0L_n(x) = \frac{\ee^x}{n!} \dd{^n}{x^n}(x^n \ee^{-x}), \qquad n \ge 0

which are orthonormal and hence ϕn(x)=Ln(x)\phi_n(x) = L_n(x). Moreover they obey the recurrence relation

Ln+1(x)=2n+1xn+1Ln(x)nn+1Ln1(x)L_{n+1}(x) = \frac{2n + 1 -x}{n+1} L_n(x) - \frac{n}{n+1} L_{n-1}(x)

7.6.8Least squares approximation

Recall that we are trying to minimize fr\norm{f - r} wrt rPnr \in \poly_n, where the norm comes from a weighted inner product. The solution of this problem will be simplified if we express rr in terms of the orthonormal polynomials {ϕj}\{ \phi_j \} corresponding to the chosen inner product. So let

r(x)=j=0nbjϕj(x)r(x) = \sum_{j=0}^n b_j \phi_j(x)

Then for a given fC[a,b]f \in \cts[a,b]

fr2=abw(x)[f(x)j=0nbjϕj(x)]2dx=:G(b0,b1,,bn)\norm{f-r}^2 = \int_a^b w(x) \left[ f(x) - \sum_{j=0}^n b_j \phi_j(x) \right]^2 \ud x =: G(b_0, b_1, \ldots,b_n)

is to be minimized wrt the parameters b0,b1,,bnb_0, b_1, \ldots, b_n. Now

0G(b0,b1,,bn)=(fj=0nbjϕj,fi=0nbiϕi)=(f,f)2j=0nbj(f,ϕj)+i=0nj=0nbibj(ϕi,ϕj)δij=f22j=0nbj(f,ϕj)+j=0nbj2=f2j=0n(f,ϕj)2+j=0n[(f,ϕj)bj]2\begin{aligned} 0 &\le G(b_0, b_1, \ldots, b_n) \\ &= \ip{ f - \sum_{j=0}^n b_j\phi_j, f - \sum_{i=0}^n b_i \phi_i} \\ &= \ip{f,f} - 2 \sum_{j=0}^n b_j \ip{f,\phi_j} + \sum_{i=0}^n \sum_{j=0}^n b_i b_j \underbrace{\ip{\phi_i, \phi_j}}_{\delta_{ij}} \\ &= \norm{f}^2 - 2 \sum_{j=0}^n b_j \ip{f,\phi_j} + \sum_{j=0}^n b_j^2 \\ &= \norm{f}^2 - \sum_{j=0}^n \ip{f,\phi_j}^2 + \sum_{j=0}^n [\ip{f,\phi_j} - b_j]^2 \end{aligned}

The first two terms do not depend on the {bj}\{b_j\} and the third term is non-negative. Hence GG is minimized iff the third term is zero which implies that

bj=(f,ϕj),j=0,1,,nb_j = \ip{f,\phi_j}, \qquad j=0,1,\ldots,n

is the unique solution of the minimization problem. Hence the least squares approximation exists, is unique and given by

rn(x)=j=0n(f,ϕj)ϕj(x)r_n^*(x) = \sum_{j=0}^n \ip{f,\phi_j} \phi_j(x)

As nn \to \infty the best approximations in 2-norm converge to the function.

7.6.9Chebyshev least squares approximation

The best approximation in this basis is given by

Cn(x)=j=0najTj(x),aj=2π11f(x)Tj(x)1x2dxC_n(x) = \sum_{j=0}^n{}' a_j T_j(x), \qquad a_j = \frac{2}{\pi} \int_{-1}^1 \frac{f(x) T_j(x)}{\sqrt{1-x^2}} \ud x

where the prime indicates that the first term must be multiplied by 12\half. Let us make the change of variable

x=cosθ,θ[0,π]x = \cos\theta, \qquad \theta \in [0,\pi]

Then Tj(x)=cos(jcos1x)=cos(jθ)T_j(x) = \cos(j\cos^{-1}x) = \cos(j\theta) and

Cn(cosθ)=j=0najcos(jθ),aj=2π0πcos(jθ)f(cosθ)dθC_n(\cos\theta) = \sum_{j=0}^n{}' a_j \cos(j\theta), \qquad a_j = \frac{2}{\pi} \int_0^\pi \cos(j\theta) f(\cos\theta) \ud\theta

Define

F(θ)=f(cosθ),θ[0,π]F(\theta) = f(\cos\theta), \qquad \theta \in [0,\pi]

and extend it to [π,0][-\pi,0] as an even function

F(θ)=f(cosθ),θ[π,π]F(\theta) = f(\cos\theta), \qquad \theta \in [-\pi,\pi]

The Chebyshev series becomes a Fourier cosine series, compare to (8), (2),

SnF(θ)=j=0najcos(jθ),aj=1πππcos(jθ)F(θ)dθS_n F(\theta) = \sum_{j=0}^n{}' a_j \cos(j\theta), \qquad a_j = \frac{1}{\pi} \int_{-\pi}^\pi \cos(j\theta) F(\theta) \ud\theta

Since F(θ)F(\theta) is even, the sine terms vanish, and this is also the full Fourier series.


Now if f(x)f(x) is piecewise continuous, so is F(θ)F(\theta), and we know that

aj=O(1j),j|a_j| = \order{\frac{1}{j}}, \qquad j \to \infty

In this case, we have fCn20\norm{f - C_n}_2 \to 0.


If fCν1[1,1]f \in \cts^{\nu-1}[-1,1] for some ν1\nu \ge 1 and f(ν)f^{(\nu)} is piecewise continuous, then FCpν1[π,π]F \in \cts^{\nu-1}_p[-\pi,\pi] and F(ν)F^{(\nu)} is piecewise continuous. This means that

aj=O(1jν+1),j|a_j| = \order{\frac{1}{j^{\nu+1}}}, \qquad j \to \infty

In this case, we also get fCn\norm{f - C_n}_\infty \to \infty.


Note that we do not require f(s)(1)=f(s)(1)f^{(s)}(-1) = f^{(s)}(1); because of the way F(θ)F(\theta) is constructed, it satisfies F(s)(π)=F(s)(π)F^{(s)}(-\pi) = F^{(s)}(\pi) if fCs[1,1]f \in \cts^s[-1,1].

7.6.10Chebyshev least squares and minimax

Suppose ff is very smooth, say f(ν)f^{(\nu)} of bounded variation for some ν1\nu \gg 1, then aj=O(1/jν+1)|a_j| = \order{1/j^{\nu+1}} and

f(x)Cn(x)=j=n+1ajTj(x)an+1Tn+1(x)f(x) - C_n(x) = \sum_{j=n+1}^\infty a_j T_j(x) \approx a_{n+1} T_{n+1}(x)

Now

Tn+1(x)1,x[1,+1]|T_{n+1}(x)| \le 1, \qquad x \in [-1,+1]

and at the n+2n+2 points

xj=cos(jπn+1),j=0,1,2,,n+1x_j = \cos\left( \frac{j\pi}{n+1} \right), \quad j=0,1,2,\ldots,n+1

it takes extreme values

Tn+1(xj)=(1)jT_{n+1}(x_j) = (-1)^j

The error

f(xj)Cn(xj)an+1(1)jf(x_j) - C_n(x_j) \approx a_{n+1} (-1)^j

is approximately equi-oscillating at these n+2n+2 points. This indicates that the Chebyshev least squares approximation can be expected to be close to the minimax approximation, CnpnC_n \approx p_n^\star.

7.6.11Chebyshev interpolation and minimax

If ff is very smooth, then

f(x)Cn(x)an+1Tn+1(x)f(x) - C_n(x) \approx a_{n+1} T_{n+1}(x)

The error at the n+1n+1 roots of Tn+1T_{n+1} given by (Chebyshev points of first kind)

xj=cos(2j+12n+2π),j=0,1,2,,nx_j = \cos\left( \frac{2j + 1}{2n + 2} \pi \right), \qquad j=0,1,2,\ldots,n

is nearly zero.

Let us construct the polynomial In(x)I_n(x) which interpolates f(x)f(x) at the n+1n+1 roots of Tn+1(x)T_{n+1}(x). Then

In(x)=j=0nf(xj)j(x)=j=0nCn(xj)j(x)+j=0n[f(xj)Cn(xj)]an+1Tn+1(xj)=0j(x)j=0nCn(xj)j(x)=Cn(x)pn(x)\begin{aligned} I_n(x) &= \sum_{j=0}^n f(x_j) \ell_j(x) \\ &= \sum_{j=0}^n C_n(x_j) \ell_j(x) + \sum_{j=0}^n \underbrace{[f(x_j) - C_n(x_j)]}_{\approx a_{n+1}T_{n+1}(x_j) = 0} \ell_j(x) \\ &\approx \sum_{j=0}^n C_n(x_j) \ell_j(x) \\ &= C_n(x) \\ &\approx p_n^*(x) \end{aligned}

Thus the Chebyshev interpolation at first kind points can be expected to be close to the best approximation. We observed this in Example 2. Recall from Section 6.2.9 that these Chebyshev nodes minimize the node polynomial appearing in polynomial interpolation error formula.

References
  1. Brezis, H. (2010). Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer New York. 10.1007/978-0-387-70914-7
  2. Kreyszig, E. (1978). Introductory Functional Analysis With Applications. John Wiley & Sons, Inc.