Skip to article frontmatterSkip to article content

Secant Method

from pylab import *

1Quasi-Newton methods

Newton method requires the knowledge of the gradient. In many situations, it may not be possible to find the gradient. E.g., if f(x)f(x) is available only as a computer code, then it may not be possible to compute the gradient[1]. In such situations we can try to approximate the gradient and methods which make use of approximate derivatives are called quasi-Newton methods. For a scalar function f(x)f(x), we can approximate the derivative using a finite difference approximation, e.g.,

f(xk)f(xk+h)f(xk)h,0h1f'(x_k) \approx \frac{f(x_k+h) - f(x_k)}{h}, \qquad 0 \le h \ll 1

Since we want to generate a sequence of approximations to the root, we can use two successive values of xkx_k to estimate the derivative

f(xk)f(xk)f(xk1)xkxk1f'(x_k) \approx \frac{f(x_k) - f(x_{k-1})}{x_k - x_{k-1}}

Using this derivative in place of the exact derivative in Newton method leads to the secant method.

2Secant method

Given two starting values x0x_0, x1x_1 which are sufficiently close to the root α, we construct a straight line approximation to f(x)f(x) passing through the points (x0,f(x0))(x_0,f(x_0)) and (x1,f(x1))(x_1, f(x_1)). We find the zero of this straight line which will form the next approximation to the root x2x_2 and is given by

x2=x1f(x1)x1x0f(x1)f(x0)x_2 = x_1 - f(x_1) \frac{x_1 - x_0}{f(x_1) - f(x_0)}

Inductively, we obtain

xk+1=xkf(xk)xkxk1f(xk)f(xk1)x_{k+1} = x_k - f(x_k) \frac{x_k - x_{k-1}}{f(x_k) - f(x_{k-1})}

Note that this is just Newton iteration formulae where the derivative has been approximated by the finite difference formula.

3Convergence analysis

From the iteration formula, we get by subtracting α on both sides

αxk+1=αxk+f(xk)xkxk1f(xk)f(xk1)\alpha - x_{k+1} = \alpha - x_k + f(x_k) \frac{x_k - x_{k-1}}{f(x_k) - f(x_{k-1})}

Define the Newton divided differences

f[xk1,xk]:=f(xk)f(xk1)xkxk1,f[xk1,xk,α]:=f[xk,α]f[xk1,xk]αxk1f[x_{k-1},x_k] := \frac{f(x_k) - f(x_{k-1})}{x_k - x_{k-1}}, \qquad f[x_{k-1},x_k, \alpha]:= \frac{f[x_k,\alpha] - f[x_{k-1},x_k]}{\alpha - x_{k-1}}

then it is easy to show that

f[xk1,xk]=f(yk),for some yk between xk1,xkf[x_{k-1},x_k] = f'(y_k), \qquad \textrm{for some $y_k$ between $x_{k-1}, x_k$}

and

f[xk1,xk,α]=12f(zk)for some zk between xk1,xk,αf[x_{k-1},x_k,\alpha] = \half f''(z_k) \qquad \textrm{for some $z_k$ between $x_{k-1},x_k,\alpha$}

Hence we get the error formula

αxk+1=(αxk1)(αxk)f[xk1,xk,α]2f[xk1,xk]=(αxk1)(αxk)f(zk)2f(yk)\alpha - x_{k+1} = -(\alpha - x_{k-1})(\alpha - x_k) \frac{f[x_{k-1},x_k,\alpha]} {2f[x_{k-1},x_k]} = -(\alpha - x_{k-1})(\alpha - x_k)\frac{f''(z_k)}{2f'(y_k)}

If we can bound the last factor

f(zk)2f(yk)M<\left| \frac{f''(z_k)}{2f'(y_k)} \right| \le M < \infty

then we get a bound on the error ek=αxke_k = |\alpha - x_k|

ek+1Mek1eke_{k+1} \le M e_{k-1} e_k

4Error estimate

We can use the function value to decide on convergence but this may not be a good criterion if the function is very flat around the root. Additionally we can estimate the error in the root also. From mean value theorem

f(xk)=f(xk)f(r)=f(ξk)(xkr)f(x_k) = f(x_k) - f(r) = f'(\xi_k) (x_k - r)

where ξk\xi_k is between xkx_k and rr.

rxk=f(xk)f(ξk)r - x_k = - \frac{f(x_k)}{f'(\xi_k)}

If ff' is not changing rapidly between xkx_k and rr then

f(ξk)f(xk+1)f(xk)xk+1xkf'(\xi_k) \approx \frac{f(x_{k+1}) - f(x_k)}{x_{k+1} - x_k}

We get the error estimate

rxkf(xk)f(xk+1)f(xk)xk+1xk=xk+1xkr - x_k \approx - \frac{f(x_k)}{\frac{f(x_{k+1}) - f(x_k)}{x_{k+1} - x_k}} = x_{k+1} - x_k

and the relative error estimate

rxkrxk+1xkxk+1\frac{r - x_k}{r} \approx \frac{x_{k+1} - x_k}{x_{k+1}}

Hence we can use the test

xk+1xk<ϵxk+1,0<ϵ1|x_{k+1} - x_k| < \epsilon |x_{k+1}|, \qquad 0 < \epsilon \ll 1

to decide on convergence. This is used in the code examples to stop the iterations.

5Accuracy of FD approximation

The secant method can be written as

xk+1=xkf(xk)gk,gk=f(xk)f(xk1)xkxk1x_{k+1} = x_k - \frac{f(x_k)}{g_k}, \qquad g_k = \frac{f(x_k) - f(x_{k-1})}{x_k - x_{k-1}}

As xkαx_k \to \alpha, the quantities xkxk1x_k - x_{k-1} and f(xk)f(xk1)f(x_k) - f(x_{k-1}) can suffer from round-off error and gkg_k can then be inaccurate. Let us analyze this using Taylor expansion for the finite difference approximation

f(x+h)f(x)h=f(x)+12hf(ξ),ξ between x and x+h\frac{f(x+h) - f(x)}{h} = f'(x) + \half h f''(\xi), \qquad \textrm{$\xi$ between $x$ and $x+h$}

We expect that as h0h \to 0, we get a more accurate approximation of the derivative. However, this assumes that we can evaluate f(x)f(x) exactly and the arithmetic is exact, both of which are not true in a computer. What we compute on the computer is actually (there is also roundoff error in subtraction which we ignore)

f^(x+h)f^(x)h\frac{\hat{f}(x+h) - \hat{f}(x)}{h}

where, assuming f(x)=O(1)f(x) = O(1)

f^(x+h)=f(x+h)+ϵ2,f^(x)=f(x)+ϵ1\hat{f}(x+h) = f(x+h) + \epsilon_2, \quad \hat{f}(x) = f(x) + \epsilon_1

and

ϵ1,ϵ2=O(u),the unit round of the computer\epsilon_1, \epsilon_2 = O(\uround), \quad \textrm{the unit round of the computer}

Hence

f^(x+h)f^(x)h=f(x+h)f(x)h+Cuh=f(x)+12hf(ξ)+Cuh\begin{aligned} \frac{\hat{f}(x+h) - \hat{f}(x)}{h} &=& \frac{f(x+h) - f(x)}{h} + \frac{C\uround}{h} \\ &=& f'(x) + \half h f''(\xi) + \frac{C\uround}{h} \end{aligned}

The error consists of discretization error and round-off error. As h0h \to 0, the discretization error decreases but the round-off error increases. There is an optimum value of hh for which the total error is minimized, given by

hopt=(2Cuf)12h_{opt} = \left( \frac{2C \uround}{f''} \right)^\half

For the above finite difference approximation, the optimum step size h=O(u)h = O(\sqrt{\uround}). Based on this analysis, we can modify the secant method as follows. If

xkxk1>xk1+xk2u|x_k - x_{k-1}| > \left| \frac{x_{k-1} + x_k}{2} \right| \sqrt{\uround}

use the gkg_k as given above. Otherwise

gk=f(xk+h)f(xk)h,h=xkug_k = \frac{f(x_k + h) - f(x_k)}{h}, \qquad h = |x_k| \sqrt{\uround}

6Complex step method

We can take a complex step instead of a real step. Taylor expansion with a small complex step yields

f(x+ih)=f(x)+ihf(x)12h2f(x)+O(ih3)f(x+\ii h) = f(x) + \ii h f'(x) - \half h^2 f''(x) + O(\ii h^3)

Hence

imagf(x+ih)=hf(x)+O(h3)\imag f(x+\ii h) = h f'(x) + O(h^3)

and

imagf(x+ih)h=f(x)+O(h2)\frac{\imag f(x+\ii h)}{h} = f'(x) + O(h^2)

is a second order accurate approximation to the derivative. This formula does not involve subtraction of near equal quantities and hence it does not suffer from round-off error even if we take hh to be very small. In fact it is perfectly fine to take hh smaller than machine precision, e.g., h=1020h = 10^{-20}.

Footnotes
  1. There are methods to differentiate a computer program called Automatic Differentiation.