Homotopy and continuation methods

#%config InlineBackend.figure_format = 'svg'
from pylab import *
from scipy.integrate import odeint
from scipy.linalg import solve, norm

The material of this Chapter is from Kincaid & Cheney, 2002.

4.1Main idea¶

Let $f : X \to Y$ and consider the problem: find $x \in X$ such that $f(x) = 0$ . This problem may be difficult to solve; for Newton method, we need a good initial guess, which we may not know.

Suppose we can easily solve the problem: find $x \in X$ such that $g(x) = 0$ . Define

h(t,x) = t f(x) + (1-t) g(x)

(1)

Note that

h(0,x) = g(x), \qquad h(1,x) = f(x)

(2)

Partition the interval $[0,1]$ into

0 = t_0 < t_1 < t_2 < \ldots < t_m = 1

(3)

We can easily solve the problem $h(t_0,x) = g(x) = 0$ , let $x_0$ be the solution. Now to solve the problem $h(t_1,x)=0$ , say using Newton method, we can use $x_0$ as the initial guess. Since the two problems are close, $x_0$ can be expected to be a good initial guess for the next problem. This process can be continued until we solve the problem $h(t_m,x) = f(x) = 0$ using the initial guess $x_{m-1}$ .

The simplest example is the linear interpolation between the two problems, which we discussed above.

4.2An ODE¶

Instead of solving the sequence of problems, we will convert it into an ODE problem. The solution of $h(t,x)=0$ depends on the parameter $t$ , so $x(t)$ satisfies

h(t, x(t)) = 0

(8)

Differentiate wrt $t$

h_t (t, x(t)) + h_x(t,x(t)) x'(t) = 0

(9)

so that

x'(t) = - [h_x(t,x(t))]^{-1} h_t(t,x(t)) =: R(t, x(t))

(10)

For this ODE to be valid, we need $h$ to be differentiable wrt $t$ and $x$ and $h_x$ must be non-singular. Suppose we know that $x_0$ solves $h(0,x)=0$ . We solve the ODE approximately upto $t=1$ with initial condition $x(0) = x_0$ .

Example 3

Let $X = Y = \re^2$ and $x = (\xi_1, \xi_2) \in \re^2$ . Consider the problem

f(x) = \begin{bmatrix} \xi_1^2 - 3 \xi_2^2 + 3 \\ \xi_1 \xi_2 + 6 \end{bmatrix} = 0

(11)

Let $x_0 = (1,1)$ and define the homotopy

h(t,x) = t f(x) + (1-t)(f(x) - f(x_0))

(12)

Then

h_x = f'(x) = \left[ \df{f_i}{\xi_j} \right]_{ij} = \begin{bmatrix} 2\xi_1 & - 6 \xi_2 \\ \xi_2 & \xi_1 \end{bmatrix}, \qquad h_t = f(x_0) = \begin{bmatrix} 1 \\ 7 \end{bmatrix}

(13)

h_x^{-1} = \frac{1}{\Delta} \begin{bmatrix} \xi_1 & 6 \xi_2 \\ -\xi_2 & 2\xi_1 \end{bmatrix}, \qquad \Delta = 2\xi_1^2 + 6 \xi_2^2

(14)

The ODE is given by

\dd{}{t}\begin{bmatrix} \xi_1 \\ \xi_2 \end{bmatrix} = -h_x^{-1} h_t = -\frac{1}{\Delta} \begin{bmatrix} \xi_1 + 42 \xi_2 \\ -\xi_2 + 14 \xi_1 \end{bmatrix}

(15)

Integrating the ODE from $t=0$ to $t=1$ using some numerical ODE solver, we get

x(1) \approx \begin{bmatrix} -2.990 \\ 2.006 \end{bmatrix}

(16)

while the exact root is $\begin{bmatrix} -3 \\ 2 \end{bmatrix}$ . We do not get the exact root because the ODE solver is approximate. Starting with the approximate solution of the ODE, perform a few steps of Newton-Raphson iterations to get a more accurate estimate of the root.

Example 4 (continue previous example)

We now implement code to apply this idea. This is the function whose roots are required and its gradient.

def f(x):
    y = zeros(2)
    y[0] = x[0]**2 - 3*x[1]**2 + 3
    y[1] = x[0]*x[1] + 6
    return y

def df(x):
    y = array([[2*x[0], -6*x[1]],
               [x[1],    x[0]]])
    return y

This is the rhs of the ODE obtained from homotopy method.

def F(x,t):
    y = zeros(2)
    delta = 2*x[0]**2 + 6*x[1]**2
    y[0] = -(x[0] + 42*x[1])/delta
    y[1] = -(-x[1] + 14*x[0])/delta
    return y

We solve the ode with a relaxed error tolerance using scipy.odeint.

x0 = array([1.0,1.0])
t = linspace(0,1,100)
x = odeint(F,x0,t,rtol=0.1)

# plot results
plot(t,x), xlabel('t'), ylabel('x(t)'), grid(True)
legend(('$\\xi_1$', '$\\xi_2$'))

# Final solution
xf = array([x[-1,0],x[-1,1]])
print('x =',xf)
print('f(x) =',f(xf))

x = [-2.99046055  2.00603356]
f(x) = [-0.12965756  0.00103578]

The graph shows how $(\xi_1,\xi_2)$ change with $t$ ; at the final solution of ODE, $f(x)$ is not close to zero. Now we can improve the final solution by applying Newton-Raphson method.

N = 10
eps = 1.0e-13

print('%18.10e %18.10e %18.10e' % (xf[0], xf[1], norm(f(xf))))
for i in range(N):
    J = df(xf)
    dx = solve(J,-f(xf))
    xf = xf + dx
    print('%18.10e %18.10e %18.10e' % (xf[0], xf[1], norm(f(xf))))
    if norm(dx) < eps*norm(xf):
        break

print('f(x) =',f(xf))

 -2.9904605534e+00   2.0060335560e+00   1.2966169984e-01
 -2.9999822220e+00   1.9999926789e+00   6.0518131105e-05
 -3.0000000000e+00   2.0000000000e+00   2.0260243223e-10
 -3.0000000000e+00   2.0000000000e+00   0.0000000000e+00
 -3.0000000000e+00   2.0000000000e+00   0.0000000000e+00
f(x) = [0. 0.]

We converge to many digits in about three iterations.

4.3Relation to Newton method¶

Choose an $x_0$ and consider the homotopy

h(t,x) = f(x) - \ee^{-t} f(x_0)

(19)

so that

h(0,x) = f(x) - f(x_0), \qquad h(\infty,x) = f(x)

(20)

The curve $x(t)$ such that

h(t,x(t)) = 0 = f(x(t)) - \ee^{-t} f(x_0)

(21)

satisfies the ODE

0 = f'(x(t)) x'(t) + \ee^{-t} f(x_0) = f'(x(t)) x'(t) + f(x(t))

(22)

so that

x'(t) = -[f'(x(t))]^{-1} f(x(t))

(23)

Applying forward Euler scheme with $\Delta t = 1$ we get

x_{n+1} = x_n - [f'(x_n)]^{-1} f(x_n)

(24)

we get the Newton-Raphson method.

4.4Linear programming¶

Let $c,x \in \re^n$ , $A \in \re^{m \times n}$ , $m < n$ and consider the constrained optimization problem

\begin{cases} \max\limits_{x \in \re^n} c^\top x \\ \textrm{subject to } A x = b, \qquad x \ge 0 \end{cases}

(25)

Choose a starting point $x^0$ which satisfies the constraints, i.e.

x^0 \in \mathcal{F} = \{ x \in \re^n : A x = b, \quad x \ge 0 \}

(26)

Our goal is to find $x^1$ such that $c^\top x^1 > c^\top x^0$ and $x^1 \in \mathcal{F}$ . Equivalently, find a curve $x(t)$ such that

$x(0) = x^0$
$x(t) \ge 0$ $\forall t \ge 0$
$A x(t) = b$
$c^\top x(t)$ is increasing function of $t$ , $t \ge 0$

Let us try to find an equation satisfied by such a curve $x(t)$

x'(t) = f(x), \qquad x(0) = x^0

(27)

To satisfy (2), choose $f$ such that $f_i(x) \to 0$ as $x_i \to 0$ . One such choice is

D(x) = \diag{[x_1, x_2, \ldots, x_n]}

(28)

f(x) = D(x) G(x), \qquad x_i' = x_i G_i(x)

(29)

To satisfy (3) we must have $A x'(t) = 0$ or $A D(x) G(x) = 0$ . Choose

G(x) = P(x) H(x)

(30)

where $P : \re^n \to$ null space of $AD(x)$ . So we can take

P(x) = \textrm{orthogonal projection of $x$ into the null space of $AD(x)$}

(31)

The null space of $AD(x)$ is a subspace of $\re^n$ . By Projection Theorem in Hilbert spaces, we have

\forall v \in \re^n, \qquad Pv \in \textrm{null space of } AD(x)

(32)

\textrm{ iff } \ip{v-Pv,w} = 0 \quad \forall w \in \textrm{null space of } AD(x)

(33)

In particular

\ip{v - Pv, Pv} = 0, \qquad \forall v \in \re^n

(34)

Such a $P$ is given by

P = I - (AD)^\top [ (AD)(AD)^\top]^{-1} (AD)

(35)

The existence of $P$ requires that $(AD)(AD)^\top = (m\times n)(n\times m) = (m \times m)$ is invertible, and hence rank $AD = m$ . This is true if each $x_i > 0$ and rank $A=m$ .

Now (4) will be satisfied by the solution of the ODE if

\begin{aligned} 0 \le& \dd{}{t} c^\top x(t) \\ =& c^\top f(x) \\ =& c^\top D P(x) H(x) \\ =& (Dc)^\top P(x) H(x) \end{aligned}

(36)

Let us choose $H(x) = D(x) c$ , then

\begin{aligned} (Dc)^\top P (Dc) =& v^\top P v, \qquad v = Dc \\ =& \ip{v, Pv} \\ =& \ip{v-Pv+Pv,Pv} \\ =& \ip{v-Pv,Pv} + \ip{Pv,Pv} \\ =& \ip{Pv,Pv} \\ \ge& 0 \end{aligned}

(37)

We have constructed an ODE model whose solution has all the required properties. How do we compute $Pv$ ? Using the definition of $P$ requires computing an inverse matrix which can be expensive if the sizes involved are large. Instead, define $B=AD$ and given $v \in \re^n$ , compute $Pv$ by the following steps.

Solve for $z$
$B B^\top z = Bv$
(38)
$Pv = v - B^\top z$

Now we are in a position to solve the ODE

x'(t) = D(x) P(x) D(x) c, \qquad x(0) = x^0 \in \mathcal{F}

(39)

The simplest approach is the Euler method which finds $x^{k+1} \approx x(t_k+\Delta t_k)$ from

\frac{x^{k+1} - x^k}{\Delta t_k} = f(x^k), \qquad k=0,1,2,\ldots

(40)

x^{k+1} = x^k + \Delta t_k f(x^k)

(41)

It is easy to verify that $Ax^{k+1}=b$ since $Af(x^k) = 0$ . It remains to ensure that $x^{k+1} > 0$ which may not be automatically satisfied by the Euler method. We can satisfy this condition by choosing $\Delta t_k$ small enough,

x_i^{k+1} = x_i^k + \Delta t_k f_i(x^k) \ge 0

(42)

e.g.,

\Delta t_k = \frac{9}{10} \min_{i} \left[ \frac{-x_i^k}{f_i(x^k)} \right]

(43)

The factor $9/10$ ensures the strict inequality $x^{k+1} > 0$ .

References¶

Kincaid, D., & Cheney, W. (2002). Numerical Analysis: Mathematics of Scientific Computing (3rd ed.). American Mathematics Society.

Numerical Analysis

Fixed point iterations

Numerical Analysis

Roots of polynomials

4Homotopy and continuation methods

4.1Main idea¶

4.2An ODE¶

4.3Relation to Newton method¶

4.4Linear programming¶