\(\newcommand{\re}{\mathbb{R}}\)

QR factorization#

Given \(A \in \re^{m \times n}\) with \(m \ge n\) and \(rank(A)=n\), we want to write it as

\[ A = Q R \]

where \(Q\) has orthonormal columns, \(Q^\top Q = I_{n \times n}\) and \(R\) is upper triangular matrix.

%config InlineBackend.figure_format = 'svg'
import numpy as np
import matplotlib.pyplot as plt
import sys

Classical Gram-Schmidt algorithm#

We now implement the Gram-Schmidt algorithm.

def clgs(A):
    m, n = A.shape
    Q, R = np.empty((m,n)), np.zeros((n,n))
    for j in range(n):
        v = A[:,j]
        for i in range(j):
            R[i,j] = Q[:,i].dot(A[:,j])
            v = v - R[i,j] * Q[:,i]
        R[j,j] = np.linalg.norm(v)
        Q[:,j] = v / R[j,j]
    return Q, R

Generate a random matrix and compute the decomposition

m, n = 10, 5
A = 2 * np.random.rand(m,n) - 1
Q, R = clgs(A)

Test that \(A = Q R\) is satisfied, compute \(\max_{i,j}| (A - Q R)_{ij} |\)

np.abs(A - Q @ R).max()
np.float64(1.1102230246251565e-16)

Are the columns of \(Q\) orthonormal, i.e., \(Q^\top Q = I_{n\times n}\) ?

np.abs(Q.T @ Q - np.eye(n)).max()
np.float64(2.302070312656211e-16)

Modified Gram-Schmidt#

We now implement the modified Gram-Schmidt algorithm.

def mgs(A):
    m, n = A.shape
    Q, R = np.empty((m,n)), np.zeros((n,n))
    V = A.copy() # A is not changed
    for i in range(n):
        R[i,i] = np.linalg.norm(V[:,i])
        Q[:,i] = V[:,i] / R[i,i]
        for j in range(i+1,n):
            R[i,j] = Q[:,i].dot(V[:,j])
            V[:,j] = V[:,j] - R[i,j] * Q[:,i]
    return Q, R

Let us test it following similar approach as above.

Q, R = mgs(A)

Test that \(A = Q R\) is satisfied,

np.abs(A - Q @ R).max()
np.float64(2.220446049250313e-16)

Are the columns of \(Q\) orthonormal, i.e., \(Q^\top Q = I_{n\times n}\) ?

np.abs(Q.T @ Q - np.eye(n)).max()
np.float64(2.53397840516755e-16)

Remark: We allocated memory for a matrix \(V\) which is not necessary. We can use a common array for both \(V,Q\)

    Q, R = A.copy(), np.zeros((n,n))
    V = Q                             # V is pointer to Q

If we are ok to modify matrix \(A\) then we can use \(A\) to store \(V,Q\).

    Q, R = A, np.zeros((n,n))        # Q is pointer to A
    V = Q                            # V is pointer to A

Experiment 1: Discrete Legendre polynomials#

x = np.arange(-128,129)/128
A = np.column_stack((x**0, x**1, x**2, x**3))
Q, R = np.linalg.qr(A,mode='reduced')
scale = Q[-1,:]
Q = Q @ np.diag(1/scale)
plt.plot(Q);
../_images/17f202791c9f4d5c166c7703c9a9ab1c3db1c489d85be4ab526d82f315d86ec1.svg

Experiment 2: Classical vs modified Gram-Schmidt#

First we construct a square matrix \(A\) with random singular vectors and widely varying sngular values spaced by factors of 2 between \(2^{-1}\) and \(2^{-80}\).

\[ A = U \textrm{ diag}[2^{-1}, 2^{-2}, \ldots, 2^{-80}] V^* \]

Generate random unitary matrices \(U,V\) using QR factorization,

U, _ = np.linalg.qr(np.random.rand(80,80))
V, _ = np.linalg.qr(np.random.rand(80,80))
S = np.diag(2.0**np.arange(-1,-81,-1))
A = U @ S @ V

Now we use Algorithms 7.1 and 8.1 to compute the QR factorization

Qc, Rc = clgs(A)
Qm, Rm = mgs(A)

We plot the diagonal elements \(r_{jj}\) produced by the two Algorithms. Since \(r_{jj} = \| P_j a_j \|\), this gives the size of the projection at each step.

eps = 0.5 * sys.float_info.epsilon
plt.semilogy(np.diag(Rc),'o',label='Classical GS')
plt.semilogy(np.diag(Rm),'*',label='Modified GS')
plt.semilogy(2.0**np.arange(-1,-81,-1),'--',label='$2^{-j}$')
plt.semilogy([0,80],2*[np.sqrt(eps)],'--')
plt.semilogy([0,80],2*[eps],'--')
plt.xlim([0,95])
plt.text(82, 0.5*np.sqrt(eps), "$\\sqrt{\\varepsilon_{mach}}$",fontsize=12)
plt.text(82, 0.5*eps, "$\\varepsilon_{mach}$",fontsize=12)
plt.xlabel("j"); plt.ylabel("$r_{jj}$")
plt.title("Diagonal entries of R")
plt.legend();
../_images/0582325673a79f834092fae0ff903745b85fae07d25913c8f37327e5f5e43ae4.svg

The \(r_{jj}\) decrease with \(j\), closely following the line \(2^{-j}\). It is not exactly equal to the \(j\)’th singular value of \(A\), but is a reasonably good approximation. This can be seen as follows. The SVD of \(A\) can be written as

\[ A = 2^{-1} u_1 v_1^* + 2^{-2} u_2 v_2^* + \ldots + 2^{-80} u_{80} v_{80}^* \]

where \(u_j\) and \(v_j\) are the left and right singular vectors of \(A\), respectively. The \(j\)’th column of \(A\) is

\[ a_j = 2^{-1} \bar{v}_{j1} u_1 + 2^{-2} \bar{v}_{j2} u_2 + \ldots + 2^{-80} \bar{v}_{j,80} u_{80} \]

Since the singular vectors are random, we can expect the numbers \(\bar{v}_{ji}\) are all of similar magnitude, on the order of \(80^{-1/2} \approx 0.1\).

When we take the QR factorization, the first vector \(q_1\) is likely to be approximately equal to \(u_1\), with \(r_{11}\) of the order of \(2^{-1} \bar{v}_{j1} \approx 2^{-1} \times 80^{-1/2}\). Orthogonalization at the next step will yield a second vector \(q_2\) approximately equal to \(u_2\), with \(r_{22}\) on the order of \(2^{-2} \times 80^{-1/2}\) – and so on.

In the computation, the decrease of \(r_{jj}\) does not continue all the way to \(j=80\). This is a consequence of rounding errors on the computer. With the classical Gram-Schmidt algorithm, the numbers never become smaller than about \(10^{-8}\). With the modified Gram-Schmidt algorithm, they shrink eight orders of magnitude further, down to about \(10^{-16}\), which the machine precision.

Classical Gram-Schmidt is known to be unstable. It is rarely used, except in some parallel computations where advantages related to communication may outweight the disadvantage to instability.

Experiment 3: numerical loss of orthogonality#

The qr function in numpy uses Housholder triangularization which is explained below. We test it on a matrix whose columns are almost parallel and compare with GS algorithm.

A = np.array([[0.70000, 0.70711],
              [0.70001, 0.70711]])
Q, R = np.linalg.qr(A)
print('Householder  QR = ',np.abs(Q.T @ Q - np.eye(2)).max())
Q, R = mgs(A)
print('Modified  GS QR = ',np.abs(Q.T @ Q - np.eye(2)).max())
Q, R = clgs(A)
print('Classical GS QR = ',np.abs(Q.T @ Q - np.eye(2)).max())
Householder  QR =  2.220446049250313e-16
Modified  GS QR =  2.301438892308072e-11
Classical GS QR =  2.301438892308072e-11

The orthonormality of columns of \(Q\) is not accurate to machine precision using the classical or modified GS algorithm (they are identical for \(2 \times2\) case).

Householder triangularization#

The Householder method returns the full QR factorization. In the Algorithm given in the book, \(R\) is computed in place of \(A\), but here we make a copy and then modify it, because we want to keep \(A\) for verification.

# We need sign function with sign(0) = 1
def sign(x):
    if x >= 0.0:
        return 1.0
    else:
        return -1.0

def house(A):    
    # Algorithm 10.1
    R = A.copy()
    m, n = A.shape
    V = []
    for k in range(n):
        x = R[k:m,k]
        e1 = np.zeros_like(x); e1[0] = 1.0
        v = sign(x[0]) * np.linalg.norm(x) * e1 + x
        v = v / np.linalg.norm(v); V.append(v)
        for j in range(k,n):
            R[k:m,j] = R[k:m,j] - 2.0 * v * np.dot(v, R[k:m,j])
    
    # Algorithm 10.3
    # Compute Q as Q*I
    Q = np.empty((m,m))
    for i in range(m): # Compute i'th column of Q
        x = np.zeros(m)
        x[i] = 1.0
        for k in range(n-1,-1,-1):
            x[k:m] = x[k:m] - 2.0 * np.dot(V[k],x[k:m]) * V[k]
        Q[:,i] = x
        
    return Q, R

Let us test the above method.

m, n = 10, 5
A = 2 * np.random.rand(m,n) - 1
Q, R = house(A)

Check that \(A=QR\) is satisfied

np.abs(A - Q @ R).max()
np.float64(1.3322676295501878e-15)

Chect that columns of \(Q\) are orthonormal

np.abs(Q.T @ Q - np.eye(m)).max()
np.float64(8.881784197001252e-16)

Experiment 3: numerical loss of orthogonality#

We repeat the previous test using our own implementation of Householder.

A = np.array([[0.70000, 0.70711],
              [0.70001, 0.70711]])
Q, R = house(A)
print('Householder QR = ', np.abs(Q.T @ Q - np.eye(2)).max())
Q, R = mgs(A)
print('Modified GS QR = ',np.abs(Q.T @ Q - np.eye(2)).max())
Householder QR =  1.1102230246251565e-16
Modified GS QR =  2.301438892308072e-11

Rank deficient case#

m, n = 4, 4
A = 2 * np.random.rand(m,n) - 1
A[:,2] = A[:,0] + A[:,1] # 2nd col = 0th col + 1st col
Q, R = house(A)
print("Q = \n", Q)
print("R = \n", R)
Q = 
 [[-0.3614825   0.62929573  0.00535279 -0.68795976]
 [-0.6491394   0.18892249  0.52405008  0.51797475]
 [-0.60537991 -0.2416831  -0.75285746  0.09115971]
 [-0.28541829 -0.71406194  0.3981815  -0.50010343]]
R = 
 [[-1.25800549e+00  1.16899179e+00 -8.90137024e-02 -9.66576646e-02]
 [ 0.00000000e+00  6.41501346e-01  6.41501346e-01  3.04786833e-01]
 [ 0.00000000e+00  0.00000000e+00  1.11022302e-16 -1.08460348e+00]
 [ 0.00000000e+00 -1.11022302e-16  0.00000000e+00  4.48415877e-01]]

Note that R[2,2] is almost zero.