Skip to article frontmatterSkip to article content

Newton method

#%config InlineBackend.figure_format = 'svg'
from pylab import *
import sympy

The Newton-Raphson method is an iterative method to find the roots of a function. It starts with an initial guess for the root and tries to improve it. Given an initial guess x0x_0 for the root, we have

f(x0)0f(x_0) \ne 0

Let us try to find a correction Δx0\Delta x_0 which brings it closer to the root

f(x0+Δx0)=0f(x_0 + \Delta x_0) = 0

Using Taylor formula, and truncating it at the linear term

f(x0)+Δx0f(x0)0Δx0f(x0)f(x0)f(x_0) + \Delta x_0 f'(x_0) \approx 0 \qquad \Longrightarrow \qquad \Delta x_0 \approx - \frac{f(x_0)}{f'(x_0)}

We obtain a {\em better} estimate for the root

x1=x0+Δx0=x0f(x0)f(x0)x_1 = x_0 + \Delta x_0 = x_0 - \frac{f(x_0)}{f'(x_0)}

Geometrically, we are approximating the function by the tangent line at x0x_0 which is given by

y=f(x0)+(xx0)f(x0)y = f(x_0) + (x-x_0) f'(x_0)

and finding the zero of the tangent line, see figure. We now repeat the same process for x1,x2,x_1, x_2, \ldots until some convergence is achieved. The iterations are given by

xn+1=xnf(xn)f(xn),n=0,1,2,x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, \qquad n=0,1,2,\ldots

This algorithm requires that ff is differentiable and ff' is not zero in a neighbourhood of the root. The algorithm for Newton-Raphson method is illustrated in Algorithm.

1Convergence analysis

Let rr be a root of ff. From Taylor formula with remainder term

0=f(r)=f(xn)+(rxn)f(xn)+12(rxn)2f(ξn),ξn between r and xn0 = f(r) = f(x_n) + (r - x_n) f'(x_n) + \half (r - x_n)^2 f''(\xi_n), \qquad \xi_n \mbox{ between $r$ and $x_n$}

Solve for rr from the linear term

r=xnf(xn)f(xn)xn+112(rxn)2f(ξn)f(xn)r = \underbrace{x_n - \frac{f(x_n)}{f'(x_n)}}_{x_{n+1}} - \half (r - x_n)^2 \frac{f''(\xi_n)}{f'(x_n)}

The error in the root is

rxn+1=(rxn)2f(ξn)2f(xn)r - x_{n+1} = - (r - x_n)^2 \frac{f''(\xi_n)}{2 f'(x_n)}

Atleast in a neighborhood of the root, we will assume that

f(ξn)2f(xn)C\left| \frac{f''(\xi_n)}{2 f'(x_n)} \right| \le C

Then the error en=rxne_n = |r - x_n| satisfies

en+1Cen2e_{n+1} \le C e_n^2

To show convergence xnrx_n \to r, we have to show that en0e_n \to 0.

2Error estimate

From mean value theorem

f(xn)=f(xn)f(r)=f(ξn)(xnr)f(x_n) = f(x_n) - f(r) = f'(\xi_n) (x_n - r)

where ξn\xi_n is between xnx_n and rr.

rxn=f(xn)f(ξn)r - x_n = - \frac{f(x_n)}{f'(\xi_n)}

If ff' is not changing rapidly between xnx_n and rr then

f(ξn)f(xn)f'(\xi_n) \approx f'(x_n)

We get the {\em error estimate}

rxnf(xn)f(xn)=xn+1xnr - x_n \approx - \frac{f(x_n)}{f'(x_n)} = x_{n+1} - x_n

and the {\em relative error estimate}

rxnrxn+1xnxn+1\frac{r - x_n}{r} \approx \frac{x_{n+1} - x_n}{x_{n+1}}

Hence we can use the test

xn+1xn<ϵxn+1,0<ϵ1|x_{n+1} - x_n| < \epsilon |x_{n+1}|, \qquad 0 < \epsilon \ll 1

to decide on convergence. This is used in the code examples to stop the iterations.

3Implication of quadratic convergence

What does quadratic convergence imply in practical terms ? For Newton method

ek+1Mek2e_{k+1} \le M e_k^2

Suppose M1.0M \approx 1.0 and e0=101e_0 = 10^{-1}. Then

e1102e2104e3108e41016(limit of double precision)\begin{align*} & e_1 \approx 10^{-2} \\ & e_2 \approx 10^{-4} \\ & e_3 \approx 10^{-8} \\ & e_4 \approx 10^{-16} \qquad \textrm{(limit of double precision)} \end{align*}

If α=O(1)\alpha = O(1), then

x1 accurate to 2 digitsx2 accurate to 4 digitsx3 accurate to 8 digitsx4 accurate to 16 digits\begin{align*} & x_1 \textrm{ accurate to 2 digits} \\ & x_2 \textrm{ accurate to 4 digits} \\ & x_3 \textrm{ accurate to 8 digits} \\ & x_4 \textrm{ accurate to 16 digits} \end{align*}

Newton’s method doubles the number of accurate figures in each iteration. Of course this happens only after the method enters the quadratic convergence regime. In the first few iterations, there may be no reduction of error or less than quadratic reduction.

\fbox{Show quadratic convergence in square root example}

4Inflection point

If we have a function with an inflection point, see figure, then the Newton iterates may cycle indefinitely without any convergence. The Newton method in step for is

Δxk=f(xk)f(xk),xk+1=xk+Δxk\Delta x_k = - \frac{f(x_k)}{f'(x_k)}, \qquad x_{k+1} = x_k + \Delta x_k

We can use f(x)|f(x)| as a metric to measure convergence towards the root: f(xk)|f(x_k)| must decrease monotonically towards zero. We will accept the step Δxk\Delta x_k if f(xk+1)<f(xk)| f(x_{k+1})| < |f(x_k)|, otherwise halve the step size Δxk12Δxk\Delta x_k \leftarrow \half \Delta x_k. We can halve the step size repeatedly until reduction in f(x)|f(x)| is obtained; such a step size always exists. Moreover, to prevent wild oscillations of the iterates, the step size can also be restricted, e.g., dont allow step size to more than double in successive steps

Δxk2Δxk1|\Delta x_k| \le 2 |\Delta x_{k-1}|

5Implicit functions

Suppose we have a function y=y(x)y=y(x) which is defined implicitly by

G(x,y)=0G(x,y) = 0

We may not be able find an explicit expression y(x)y(x), instead we find the value of yy for some discrete set of xx values. This leads to a root finding problem for each xx and we can apply Newton method.

Take a value of xx and make an initial guess y0y_0. This may not satisfy G(x,y0)=0G(x,y_0) = 0 so we want to find a correction y0+Δy0y_0 + \Delta y_0 such that G(x,y0+Δy0)=0G(x,y_0+\Delta y_0) = 0. By Taylor expansion

G(x,y0)+Gy(x,y0)Δy0=0Δy0=G(x,y0)Gy(x,y0)G(x,y_0) + G_y(x,y_0) \Delta y_0 = 0 \quad\Longrightarrow\quad \Delta y_0 = - \frac{G(x,y_0)}{G_y(x,y_0)}

and hence we get an improved estimate

y1=y0+Δy0=y0G(x,y0)Gy(x,y0)y_1 = y_0 + \Delta y_0 = y_0 - \frac{G(x,y_0)}{G_y(x,y_0)}

This can be used to generate an iterative scheme

yk+1=ykG(x,yk)Gy(x,yk),k=0,1,2,y_{k+1} = y_k - \frac{G(x,y_k)}{G_y(x,y_k)}, \qquad k=0,1,2,\ldots

If we have to solve this problem for several values of xx, then we can use the previous solution as an initial guess for the root, which can speed up the convergence of Newton method.

6System of equations

Consider a function f:RNRNf : \re^N \to \re^N such that for x=[x1,x2,,xN]x = [x_1, x_2, \ldots, x_N]^\top

f(x)=[f1(x),,fN(x)]RNf(x) = [f_1(x), \ldots, f_N(x)]^\top \in \re^N

We want to find an αRN\alpha \in \re^N such that f(α)=0f(\alpha) = 0. This is a system of NN non-linear equations for the NN unknowns. We start with an initial guess x0RNx^0 \in \re^N and in general f(x0)0f(x^0) \ne 0. We will try to find a correction Δx0\Delta x^0 to x0x^0 such that

f(x0+Δx0)=0f(x^0 + \Delta x^0) = 0

Approximating by the first two terms of the Taylor expansion

f(x0)+f(x0)Δx0=0f(x^0) + f'(x^0) \Delta x^0 = 0

we can find the correction by solving the matrix problem

f(x0)Δx0=f(x0)f'(x^0) \Delta x^0 = - f(x^0)

where fRN×Nf' \in \re^{N \times N} is given by

[f]ij=fixj[f']_{ij} = \df{f_i}{x_j}

Once we solve for Δx0=[f(x0)]1f(x0)\Delta x^0 = -[f'(x^0)]^{-1} f(x^0), we get a new estimate of the root

x1=x0+Δx0=x0[f(x0)]1f(x0)x^1 = x^0 + \Delta x^0 = x^0 - [f'(x^0)]^{-1} f(x^0)

The above steps are applied iteratively. Choose an initial guess x0x^0 and set k=0k=0.

  1. Solve the matrix problem

    f(xk)Δxk=f(xk) f'(x^k) \Delta x^k = - f(x^k)
  2. Update the root

    xk+1=xk+Δxk x^{k+1} = x^k + \Delta x^k
  3. k=k+1k=k+1