Least squares approximation - Numerical Analysis

#%config InlineBackend.figure_format = 'svg'
from pylab import *
from scipy.interpolate import barycentric_interpolate

7.6.1Best approximation in 2-norm¶

The minimax approximation is the best approximation since it minimizes the error at all points. This involves an optimization problem where we have to minimize the maximum norm of the error and the Remez algorithm is expensive. Optimization problems can be solved by gradient based methods but we cannot employ this due to the maximum norm which is not differentiable. To use gradient methods, let us change the norm to the $L^2$ norm

\norm{g}_2 = \left( \int_a^b |g(x)|^2 \ud x \right)^2, \qquad g \in \cts[a,b]

(1)

For a given $f \in \cts[a,b]$ and $n \ge 0$ , define

M_n(f) = \inf_{r \in \poly_n} \norm{f - r}_2

(2)

Is there a best approximating polynomial $r_n^* \in \poly_n$ , i.e.,
$\norm{f - r_n^*}_2 = M_n(f)$
(3)
Is it unique ?
How can we compute it ?

Example 1

Consider the function

f(x) = \ee^x, \qquad x \in [-1,1]

(4)

Let us find

r_1(x) = b_0 + b_1 x

(5)

by minimizing the $L^2$ error norm

\norm{f - r_1}_2^2 = \int_{-1}^1[\ee^x - b_0 - b_1 x]^2 \ud x =: F(b_0, b_1)

(6)

The optimality conditions are

\df{F}{b_0} = \df{F}{b_1} = 0

(7)

yield the linear system of equations

\begin{align} b_0 \int \ud x + b_1 \int x \ud x &= \int \ee^x \ud x \\ b_0 \int x \ud x + b_1 \int x^2 \ud x &= \int x \ee^x \ud x \end{align}

(8)

whose solution is

\begin{aligned} b_0 &= \half \int_{-1}^1 \ee^x \ud x = \sinh(1) \approx 1.1752 \\ b_1 &= \frac{3}{2} \int_{-1}^1 x \ee^x \ud x = \frac{3}{\ee} \approx 1.1036 \end{aligned}

(9)

so that

r_1^*(x) = 1.1752 + 1.1036 x, \qquad \norm{f - r_1^*}_\infty \approx 0.44

(10)

Compare this with

\begin{align} \textrm{minimax:} & \qquad \norm{f - p_1^*}_\infty \approx 0.2788 \\ \textrm{Taylor:} & \qquad \norm{f - p_1}_\infty \approx 0.7182 \end{align}

(11)

We have the ordering of error in max norm

\textrm{Error of minimax $\lt$ Error of least squares $\lt$ Error of Taylor}

(12)

Example 2 (Cubic least squares)

It is given by

r_3^*(x) = 0.996294 + 0.997955 x + 0.536722 x^2 + 0.176139 x^3

(13)

r3 = lambda x: 0.996294 + 0.997955*x + 0.536722*x**2 + 0.176139*x**3
x  = linspace(-1,1,100)
plot(x,exp(x)-r3(x))
title('Error of cubic least squares approximation of exp(x)')
grid(True), xlabel('x');

and the error oscillates in sign but the error is somewhat larger at the end points, with

\norm{f - r_3^*}_\infty \approx 0.0112

(14)

This error is similar to error in cubic interpolating polynomial.

7.6.2General least squares problem¶

Let there be given a weight function

w(x) \ge 0, \qquad x \in [a,b]

(15)

such that

$\int_a^b|x|^n w(x) \ud x < \infty$ , $\forall n \ge 0$
If $\int_a^b w(x) g(x) \ud x = 0$ for some non-negative continuous function $g$ $\implies$ $g(x) \equiv 0$ on $(a,b)$ .

Examples of commonly used weight functions are

\begin{aligned} w(x) &= 1, \qquad\qquad a \le x \le b \\ w(x) &= \frac{1}{\sqrt{1-x^2}}, \quad -1 \le x \le +1 \\ w(x) &= \ee^{-x}, \qquad\quad 0 \le x \le \infty \\ w(x) &= \ee^{-x^2}, \quad\quad -\infty < x < \infty \end{aligned}

(16)

7.6.2.1Least squares problem¶

Given $f \in \cts[a,b]$ , find the polynomial $r_n^* \in \poly_n$ which minimizes

\int_a^b w(x)[f(x) - r(x)]^2 \ud x

(17)

among all polynomials $r \in \poly_n$ .

Does it exist ?
Is it unique ?
How to find it ?

7.6.3Least squares using monomials¶

Find the coefficients in the polynomial

r_n(x) = \sum_{j=0}^n a_j x^j, \qquad x \in [a,b] = [0,1]

(18)

so that, with weight function $w(x) \equiv 1$

F(a_0,a_1,\ldots,a_n) := \int_0^1 [f(x) - r_n(x)]^2 \ud x

(19)

is minimized. The necessary optimality conditions are

\df{F}{a_i} = 0 \quad\implies\quad \sum_{j=0}^n a_j \int_0^1 x^{i+j} \ud x = \int_0^1 f(x) x^i \ud x, \qquad i=0,1,\ldots,n

(20)

This is a set of $n+1$ coupled linear equations

Aa = b, \qquad A_{ij} = \frac{1}{i+j+1}, \qquad 0 \le i,j \le n

(21)

$A$ is called the Hilbert matrix and it is highly ill conditioned

\cond(A) = \order{ \frac{(1 + \sqrt{2})^{4n+4}}{\sqrt{n+1}} }

(22)

The monomial basis $1,x,x^2,\ldots,x^n$ is linearly independent but for large powers $m,n$ , the functions $x^m, x^n$ are nearly identical which causes the bad condition numbers. Note that we can write the matrix elements are

A_{ij} = \int_0^1 \phi_i(x) \phi_j(x) \ud x, \qquad \phi_i(x) = x^i

(23)

If we could make the matrix diagonal, i.e.

A_{ij} = 0, \qquad i \ne j

(24)

then its solution would become trivial and we do not to worry about ill-conditioning. We should use a set of basis functions which has the above property, instead of using the monomials.

7.6.4Orthogonal polynomials¶

Let $w : (a,b) \to (0,\infty)$ be a weight function and define the inner product

\ip{f,g} = \int_a^b w(x) f(x) g(x) \ud x

(25)

The inner product satisfies these properties.

$\ip{\alpha f, g} = \ip{f,\alpha g} = \alpha \ip{f,g}$ for all $\alpha \in \re$
$\ip{f_1 + f_2, g} = \ip{f_1,g} + \ip{f_2,g}$ and $\ip{f,g_1+g_2} = \ip{f,g_1} + \ip{f,g_2}$
$\ip{f,g} = \ip{g,f}$
$\ip{f,f} \ge 0$ for all $f \in \cts[a,b]$ and $\ip{f,f} = 0$ iff $f=0$

Such an inner product gives rise to a norm

\norm{f} = \sqrt{ \ip{f,f} } = \sqrt{ \int_a^b w(x) |f(x)|^2 \ud x }

(26)

Morever, we have the Cauchy-Schwarz inequality

|\ip{f,g}| \le \norm{f} \cdot \norm{g}

(27)

Proof 1

The proof is constructive and uses the Gram-Schmidt orthogonalization process.

(1) Start with the constant polynomial $\phi_0(x) = c =$ constant and normalize it

1 = \ip{\phi_0,\phi_0} = c^2 \int_a^b w(x) \ud x

(30)

which fixes the constant

c = \left( \int_a^b w(x) \ud x \right)^{-1}

(31)

(2) To construct $\phi_1(x)$ , start with

\psi_1(x) := x + a_{10} \phi_0(x)

(32)

Make it orthogonal to $\phi_0$

\ip{\psi_1, \phi_0} = 0 \quad\implies\quad \ip{x,\phi_0} + a_{10} \underbrace{\ip{\phi_0,\phi_0}}_{=1} = 0 \quad\implies\quad a_{10} = -\ip{x,\phi_0}

(33)

Now normalize by defining the next function

\phi_1(x) := \frac{1}{{\norm{\psi_1}}} \psi_1(x) \quad\implies\quad \ip{\phi_1,\phi_1} = 1

(34)

(3) Suppose $\phi_0, \phi_1, \ldots, \phi_{n-1}$ have been found. For the next function which is of degree $n$ , define

\psi_n(x) = x^n + a_{n,n-1} \phi_{n-1}(x) + \ldots + a_{n,0} \phi_0(x)

(35)

Determine the coefficients by making $\psi_n$ orthogonal to all the $\phi_0, \phi_1, \ldots, \phi_{n-1}$ . For $j=0,1,\ldots,n-1$

\begin{align} \ip{\psi_n,\phi_j} = 0 \quad\implies & \quad \ip{x^n,\phi_j} + a_{n,j} \underbrace{\ip{\phi_j,\phi_j}}_{=1} = 0 \\ \quad\implies & \quad a_{n,j} = -\ip{x^n, \phi_j} \\ \phi_n(x) = \frac{1}{\norm{\psi_n}} \psi_n(x) \quad\implies & \quad \ip{\phi_n,\phi_n} = 1 \end{align}

(36)

Proof 2

(1) Let

\sum_{j=0}^n c_j \phi_j(x) = 0, \quad x \in [a,b]

(39)

Taking inner product with $\phi_i$ , $0 \le i \le n$ and using the orthogonality

c_i \ip{\phi_i, \phi_i} = 0 \quad\implies\quad c_i = 0

(40)

This proves the linear independence of $V_n$ .

(2) In the proof of Gram-Schmidt theorem, we had the intermediate result

x^j = \norm{\psi_j} \phi_j(x) - (a_{j,j-1} \phi_{j-1}(x) + \ldots + a_{j,0} \phi_0(x)), \quad 1 \le j \le n

(41)

Hence any $p \in \poly_n$ can be written as a linear combination of $V_n$ .

Alternate proof. $\poly_n$ is of dimension $n+1$ and the set $V_n$ has $n+1$ linearly independent elements of $\poly_n$ . Hence they form a basis for $\poly_n$ .

(3) Any $p \in \poly_n$ can be written as

p = \sum_{j=0}^n a_j \phi_j, \qquad a_j = \ip{p,\phi_j}

(42)

and taking inner product with $\phi_i$ yields $a_i = \ip{p,\phi_i}$ .

7.6.5Legendre polynomials¶

Take $[a,b] = [-1,1]$ and weight function

w(x) \equiv 1

(47)

The resulting orthogonal sequence are called the Legendre polynomials.

\phi_0(x) = \frac{1}{\sqrt{2}}, \quad \phi_1(x) = \sqrt{\frac{3}{2}} x, \quad \phi_2(x) = \half \sqrt{\frac{5}{2}} (3x^2 - 1)

(48)

They are usually defined in terms of the Rodrigues Formula

P_0(x)=1, \qquad P_n(x) = \frac{(-1)^n}{2^n n!} \dd{^n}{x^n}(1-x^2)^n, \qquad n \ge 1

(49)

which are polynomial solutions of Legendre’s differential equation

\dd{}{x}\left[(1-x^2) \dd{P_n}{x}\right] + n(n+1)P_n = 0, \qquad P_n(1) = 1

(50)

The $P_n$ satisfy the orthogonality conditions

\ip{P_n,P_n} = \frac{2}{2n+1}, \qquad \ip{P_n,P_m} = 0, \quad n \ne m

(51)

and the recurrence relation

P_{n+1}(x) = \frac{2n+1}{n+1} xP_n(x) - \frac{n}{n+1} P_{n-1}(x)

(52)

The orthonormal functions are

\phi_n(x) = \sqrt{\frac{2n+1}{2}} P_n(x)

(53)

7.6.6Chebyshev polynomials¶

Take $[a,b]=[-1,1]$ and weight function

w(x) = \frac{1}{\sqrt{1-x^2}}

(54)

The resulting sequence of orthogonal polynomials can be written in terms of the Chebyshev polynomials

\begin{align} T_0(x) &= 1 \\ T_1(x) &= x \\ T_{n+1}(x) &= 2 x T_n(x) - T_{n-1}(x), \quad n\ge 1 \end{align}

(55)

T_n(x) = \cos(n \cos^{-1}x)

(56)

They are orthogonal wrt to the above weight

\ip{T_n, T_m} = \begin{cases} 0 & n \ne m\\ \pi & m = n = 0 \\ \half\pi & m = n > 0 \end{cases}

(57)

Hence the orthonormal sequence is

\phi_0(x) = \frac{1}{\sqrt{\pi}}, \qquad \phi_n(x) = \sqrt{\frac{2}{\pi}} T_n(x), \quad n \ge 1

(58)

7.6.7Laguerre polynomials¶

Take $[a,b) = [0,\infty)$ and weight function

w(x) = \ee^{-x}, \qquad x \in [0,\infty)

(59)

The resulting sequence of orthogonal polynomials are called Laguerre polynomials and are usually written as

L_n(x) = \frac{\ee^x}{n!} \dd{^n}{x^n}(x^n \ee^{-x}), \qquad n \ge 0

(60)

which are orthonormal and hence $\phi_n(x) = L_n(x)$ . Moreover they obey the recurrence relation

L_{n+1}(x) = \frac{2n + 1 -x}{n+1} L_n(x) - \frac{n}{n+1} L_{n-1}(x)

(61)

7.6.8Least squares approximation¶

Recall that we are trying to minimize $\norm{f - r}$ wrt $r \in \poly_n$ , where the norm comes from a weighted inner product. The solution of this problem will be simplified if we express $r$ in terms of the orthonormal polynomials $\{ \phi_j \}$ corresponding to the chosen inner product. So let

r(x) = \sum_{j=0}^n b_j \phi_j(x)

(62)

Then for a given $f \in \cts[a,b]$

\norm{f-r}^2 = \int_a^b w(x) \left[ f(x) - \sum_{j=0}^n b_j \phi_j(x) \right]^2 \ud x =: G(b_0, b_1, \ldots,b_n)

(63)

is to be minimized wrt the parameters $b_0, b_1, \ldots, b_n$ . Now

\begin{aligned} 0 &\le G(b_0, b_1, \ldots, b_n) \\ &= \ip{ f - \sum_{j=0}^n b_j\phi_j, f - \sum_{i=0}^n b_i \phi_i} \\ &= \ip{f,f} - 2 \sum_{j=0}^n b_j \ip{f,\phi_j} + \sum_{i=0}^n \sum_{j=0}^n b_i b_j \underbrace{\ip{\phi_i, \phi_j}}_{\delta_{ij}} \\ &= \norm{f}^2 - 2 \sum_{j=0}^n b_j \ip{f,\phi_j} + \sum_{j=0}^n b_j^2 \\ &= \norm{f}^2 - \sum_{j=0}^n \ip{f,\phi_j}^2 + \sum_{j=0}^n [\ip{f,\phi_j} - b_j]^2 \end{aligned}

(64)

The first two terms do not depend on the $\{b_j\}$ and the third term is non-negative. Hence $G$ is minimized iff the third term is zero which implies that

b_j = \ip{f,\phi_j}, \qquad j=0,1,\ldots,n

(65)

is the unique solution of the minimization problem. Hence the least squares approximation exists, is unique and given by

r_n^*(x) = \sum_{j=0}^n \ip{f,\phi_j} \phi_j(x)

(66)

Remark 5

We can also find the minimizer using Calculus. The necessary condition for an extremum is

\df{G}{b_i} =-2 \int_a^b w(x) \left[ f(x) - \sum_{j=0}^n b_j \phi_j(x) \right] \phi_i(x) \ud x = 0

(70)

Using orthonormality, we get the unique solution of the optimality conditions as

\begin{align} \int_a^b w(x) f(x) \phi_i(x) \ud x &= \sum_{j=0}^n b_j \int_a^b w(x) \phi_i(x) \phi_j(x) \ud x = b_i \\ \ip{f, \phi_i} &= b_i \end{align}

(71)

The Hessian

\df{^2 G}{b_i \partial b_j} = 2 \int_a^b w(x) \phi_i(x) \phi_j(x) \ud x = 2 \delta_{ij}

(72)

is positive definite, so the extremum is the unique minimum.

As $n \to \infty$ the best approximations in 2-norm converge to the function.

Proof 4

(1) Since $r_n^*$ is the minimizer

\norm{f - r_n^*} = \min_{r \in \poly_n} \norm{f - r}

(74)

we have $\norm{f - r_n^*} \ge \norm{f - r_{n+1}^*}$ since $\poly_n \subset \poly_{n+1}$ . Hence $\norm{f - r_n^*}$ is a non-increasing sequence. But does it decrease to zero ?

(2) Let $\epsilon > 0$ be arbitrary. By Weirstrass theorem, there is a polynomial $Q_m$ of some degree $m \ge 0$ for which

\max_{x \in [a,b]}|f(x) - Q_m(x)| \le \frac{\epsilon}{c}, \qquad c = \left( \int_a^b w(x) \ud x \right)^\half

(75)

By optimality of $r_m^*$

\begin{aligned} \norm{f - r_m^*} &\le \norm{f - Q_m} \\ &= \left( \int_a^b w(x)|f(x) - Q_m(x)|^2 \ud x \right)^\half \\ &\le \left( \int_a^b w(x) \frac{\epsilon^2}{c^2} \ud x \right)^\half \\ &= \epsilon \end{aligned}

(76)

Hence, from (1)

\norm{f - r_n^*} \le \epsilon, \qquad \forall n \ge m

(77)

Since $\epsilon$ was arbitrary this proves the theorem.

7.6.9Chebyshev least squares approximation¶

The best approximation in this basis is given by

C_n(x) = \sum_{j=0}^n{}' a_j T_j(x), \qquad a_j = \frac{2}{\pi} \int_{-1}^1 \frac{f(x) T_j(x)}{\sqrt{1-x^2}} \ud x

(78)

where the prime indicates that the first term must be multiplied by $\half$ . Let us make the change of variable

x = \cos\theta, \qquad \theta \in [0,\pi]

(79)

Then $T_j(x) = \cos(j\cos^{-1}x) = \cos(j\theta)$ and

C_n(\cos\theta) = \sum_{j=0}^n{}' a_j \cos(j\theta), \qquad a_j = \frac{2}{\pi} \int_0^\pi \cos(j\theta) f(\cos\theta) \ud\theta

(80)

Define

F(\theta) = f(\cos\theta), \qquad \theta \in [0,\pi]

(81)

and extend it to $[-\pi,0]$ as an even function

F(\theta) = f(\cos\theta), \qquad \theta \in [-\pi,\pi]

(82)

The Chebyshev series becomes a Fourier cosine series, compare to (8), (2),

S_n F(\theta) = \sum_{j=0}^n{}' a_j \cos(j\theta), \qquad a_j = \frac{1}{\pi} \int_{-\pi}^\pi \cos(j\theta) F(\theta) \ud\theta

(83)

Since $F(\theta)$ is even, the sine terms vanish, and this is also the full Fourier series.

Now if $f(x)$ is piecewise continuous, so is $F(\theta)$ , and we know from Theorem 1 that

|a_j| = \order{\frac{1}{j}}, \qquad j \to \infty

(84)

In this case, we have $\norm{f - C_n}_2 \to 0$ .

If $f \in \cts^{\nu-1}[-1,1]$ for some $\nu \ge 1$ and $f^{(\nu)}$ is piecewise continuous, then $F \in \cts^{\nu-1}_p[-\pi,\pi]$ and $F^{(\nu)}$ is piecewise continuous. From Theorem 1, this means that

|a_j| = \order{\frac{1}{j^{\nu+1}}}, \qquad j \to \infty

(85)

In this case, we also get $\norm{f - C_n}_\infty \to \infty$ .

Note that we do not require $f^{(s)}(-1) = f^{(s)}(1)$ ; because of the way $F(\theta)$ is constructed, it satisfies $F^{(s)}(-\pi) = F^{(s)}(\pi)$ if $f \in \cts^s[-1,1]$ .

7.6.10Chebyshev least squares and minimax¶

Suppose $f$ is very smooth, say $f^{(\nu)}$ of bounded variation for some $\nu \gg 1$ , then $|a_j| = \order{1/j^{\nu+1}}$ and

f(x) - C_n(x) = \sum_{j=n+1}^\infty a_j T_j(x) \approx a_{n+1} T_{n+1}(x)

(86)

Now

|T_{n+1}(x)| \le 1, \qquad x \in [-1,+1]

(87)

and at the $n+2$ points

x_j = \cos\left( \frac{j\pi}{n+1} \right), \quad j=0,1,2,\ldots,n+1

(88)

it takes extreme values

T_{n+1}(x_j) = (-1)^j

(89)

The error

f(x_j) - C_n(x_j) \approx a_{n+1} (-1)^j

(90)

is approximately equi-oscillating at these $n+2$ points. This indicates that the Chebyshev least squares approximation can be expected to be close to the minimax approximation, $C_n \approx p_n^\star$ .

7.6.11Chebyshev interpolation and minimax¶

If $f$ is very smooth, then

f(x) - C_n(x) \approx a_{n+1} T_{n+1}(x)

(95)

The error at the $n+1$ roots of $T_{n+1}$ given by (Chebyshev points of first kind)

x_j = \cos\left( \frac{2j + 1}{2n + 2} \pi \right), \qquad j=0,1,2,\ldots,n

(96)

is nearly zero.

Let us construct the polynomial $I_n(x)$ which interpolates $f(x)$ at the $n+1$ roots of $T_{n+1}(x)$ . Then

\begin{aligned} I_n(x) &= \sum_{j=0}^n f(x_j) \ell_j(x) \\ &= \sum_{j=0}^n C_n(x_j) \ell_j(x) + \sum_{j=0}^n \underbrace{[f(x_j) - C_n(x_j)]}_{\approx a_{n+1}T_{n+1}(x_j) = 0} \ell_j(x) \\ &\approx \sum_{j=0}^n C_n(x_j) \ell_j(x) \\ &= C_n(x) \\ &\approx p_n^*(x) \end{aligned}

(97)

Thus the Chebyshev interpolation at first kind points can be expected to be close to the best approximation. We observed this in Example 2. Recall from Section 6.2.9 that these Chebyshev nodes minimize the node polynomial appearing in polynomial interpolation error formula.

References¶

Brezis, H. (2010). Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer New York. 10.1007/978-0-387-70914-7
Kreyszig, E. (1978). Introductory Functional Analysis With Applications. John Wiley & Sons, Inc.

Numerical Analysis

Best approximation