\(\newcommand{\re}{\mathbb{R}}\)
QR factorization#
Given \(A \in \re^{m \times n}\) with \(m \ge n\) and \(rank(A)=n\), we want to write it as
where \(Q\) has orthonormal columns, \(Q^\top Q = I_{n \times n}\) and \(R\) is upper triangular matrix.
%config InlineBackend.figure_format = 'svg'
import numpy as np
import matplotlib.pyplot as plt
import sys
Classical Gram-Schmidt algorithm#
We now implement the Gram-Schmidt algorithm.
def clgs(A):
m, n = A.shape
Q, R = np.empty((m,n)), np.zeros((n,n))
for j in range(n):
v = A[:,j]
for i in range(j):
R[i,j] = Q[:,i].dot(A[:,j])
v = v - R[i,j] * Q[:,i]
R[j,j] = np.linalg.norm(v)
Q[:,j] = v / R[j,j]
return Q, R
Generate a random matrix and compute the decomposition
m, n = 10, 5
A = 2 * np.random.rand(m,n) - 1
Q, R = clgs(A)
Test that \(A = Q R\) is satisfied, compute \(\max_{i,j}| (A - Q R)_{ij} |\)
np.abs(A - Q @ R).max()
np.float64(1.1102230246251565e-16)
Are the columns of \(Q\) orthonormal, i.e., \(Q^\top Q = I_{n\times n}\) ?
np.abs(Q.T @ Q - np.eye(n)).max()
np.float64(2.302070312656211e-16)
Modified Gram-Schmidt#
We now implement the modified Gram-Schmidt algorithm.
def mgs(A):
m, n = A.shape
Q, R = np.empty((m,n)), np.zeros((n,n))
V = A.copy() # A is not changed
for i in range(n):
R[i,i] = np.linalg.norm(V[:,i])
Q[:,i] = V[:,i] / R[i,i]
for j in range(i+1,n):
R[i,j] = Q[:,i].dot(V[:,j])
V[:,j] = V[:,j] - R[i,j] * Q[:,i]
return Q, R
Let us test it following similar approach as above.
Q, R = mgs(A)
Test that \(A = Q R\) is satisfied,
np.abs(A - Q @ R).max()
np.float64(2.220446049250313e-16)
Are the columns of \(Q\) orthonormal, i.e., \(Q^\top Q = I_{n\times n}\) ?
np.abs(Q.T @ Q - np.eye(n)).max()
np.float64(2.53397840516755e-16)
Remark: We allocated memory for a matrix \(V\) which is not necessary. We can use a common array for both \(V,Q\)
Q, R = A.copy(), np.zeros((n,n))
V = Q # V is pointer to Q
If we are ok to modify matrix \(A\) then we can use \(A\) to store \(V,Q\).
Q, R = A, np.zeros((n,n)) # Q is pointer to A
V = Q # V is pointer to A
Experiment 1: Discrete Legendre polynomials#
x = np.arange(-128,129)/128
A = np.column_stack((x**0, x**1, x**2, x**3))
Q, R = np.linalg.qr(A,mode='reduced')
scale = Q[-1,:]
Q = Q @ np.diag(1/scale)
plt.plot(Q);
Experiment 2: Classical vs modified Gram-Schmidt#
First we construct a square matrix \(A\) with random singular vectors and widely varying sngular values spaced by factors of 2 between \(2^{-1}\) and \(2^{-80}\).
Generate random unitary matrices \(U,V\) using QR factorization,
U, _ = np.linalg.qr(np.random.rand(80,80))
V, _ = np.linalg.qr(np.random.rand(80,80))
S = np.diag(2.0**np.arange(-1,-81,-1))
A = U @ S @ V
Now we use Algorithms 7.1 and 8.1 to compute the QR factorization
Qc, Rc = clgs(A)
Qm, Rm = mgs(A)
We plot the diagonal elements \(r_{jj}\) produced by the two Algorithms. Since \(r_{jj} = \| P_j a_j \|\), this gives the size of the projection at each step.
eps = 0.5 * sys.float_info.epsilon
plt.semilogy(np.diag(Rc),'o',label='Classical GS')
plt.semilogy(np.diag(Rm),'*',label='Modified GS')
plt.semilogy(2.0**np.arange(-1,-81,-1),'--',label='$2^{-j}$')
plt.semilogy([0,80],2*[np.sqrt(eps)],'--')
plt.semilogy([0,80],2*[eps],'--')
plt.xlim([0,95])
plt.text(82, 0.5*np.sqrt(eps), "$\\sqrt{\\varepsilon_{mach}}$",fontsize=12)
plt.text(82, 0.5*eps, "$\\varepsilon_{mach}$",fontsize=12)
plt.xlabel("j"); plt.ylabel("$r_{jj}$")
plt.title("Diagonal entries of R")
plt.legend();
The \(r_{jj}\) decrease with \(j\), closely following the line \(2^{-j}\). It is not exactly equal to the \(j\)’th singular value of \(A\), but is a reasonably good approximation. This can be seen as follows. The SVD of \(A\) can be written as
where \(u_j\) and \(v_j\) are the left and right singular vectors of \(A\), respectively. The \(j\)’th column of \(A\) is
Since the singular vectors are random, we can expect the numbers \(\bar{v}_{ji}\) are all of similar magnitude, on the order of \(80^{-1/2} \approx 0.1\).
When we take the QR factorization, the first vector \(q_1\) is likely to be approximately equal to \(u_1\), with \(r_{11}\) of the order of \(2^{-1} \bar{v}_{j1} \approx 2^{-1} \times 80^{-1/2}\). Orthogonalization at the next step will yield a second vector \(q_2\) approximately equal to \(u_2\), with \(r_{22}\) on the order of \(2^{-2} \times 80^{-1/2}\) – and so on.
In the computation, the decrease of \(r_{jj}\) does not continue all the way to \(j=80\). This is a consequence of rounding errors on the computer. With the classical Gram-Schmidt algorithm, the numbers never become smaller than about \(10^{-8}\). With the modified Gram-Schmidt algorithm, they shrink eight orders of magnitude further, down to about \(10^{-16}\), which the machine precision.
Classical Gram-Schmidt is known to be unstable. It is rarely used, except in some parallel computations where advantages related to communication may outweight the disadvantage to instability.
Experiment 3: numerical loss of orthogonality#
The qr
function in numpy
uses Housholder triangularization which is explained below. We test it on a matrix whose columns are almost parallel and compare with GS algorithm.
A = np.array([[0.70000, 0.70711],
[0.70001, 0.70711]])
Q, R = np.linalg.qr(A)
print('Householder QR = ',np.abs(Q.T @ Q - np.eye(2)).max())
Q, R = mgs(A)
print('Modified GS QR = ',np.abs(Q.T @ Q - np.eye(2)).max())
Q, R = clgs(A)
print('Classical GS QR = ',np.abs(Q.T @ Q - np.eye(2)).max())
Householder QR = 2.220446049250313e-16
Modified GS QR = 2.301438892308072e-11
Classical GS QR = 2.301438892308072e-11
The orthonormality of columns of \(Q\) is not accurate to machine precision using the classical or modified GS algorithm (they are identical for \(2 \times2\) case).
Householder triangularization#
The Householder method returns the full QR factorization. In the Algorithm given in the book, \(R\) is computed in place of \(A\), but here we make a copy and then modify it, because we want to keep \(A\) for verification.
# We need sign function with sign(0) = 1
def sign(x):
if x >= 0.0:
return 1.0
else:
return -1.0
def house(A):
# Algorithm 10.1
R = A.copy()
m, n = A.shape
V = []
for k in range(n):
x = R[k:m,k]
e1 = np.zeros_like(x); e1[0] = 1.0
v = sign(x[0]) * np.linalg.norm(x) * e1 + x
v = v / np.linalg.norm(v); V.append(v)
for j in range(k,n):
R[k:m,j] = R[k:m,j] - 2.0 * v * np.dot(v, R[k:m,j])
# Algorithm 10.3
# Compute Q as Q*I
Q = np.empty((m,m))
for i in range(m): # Compute i'th column of Q
x = np.zeros(m)
x[i] = 1.0
for k in range(n-1,-1,-1):
x[k:m] = x[k:m] - 2.0 * np.dot(V[k],x[k:m]) * V[k]
Q[:,i] = x
return Q, R
Let us test the above method.
m, n = 10, 5
A = 2 * np.random.rand(m,n) - 1
Q, R = house(A)
Check that \(A=QR\) is satisfied
np.abs(A - Q @ R).max()
np.float64(1.3322676295501878e-15)
Chect that columns of \(Q\) are orthonormal
np.abs(Q.T @ Q - np.eye(m)).max()
np.float64(8.881784197001252e-16)
Experiment 3: numerical loss of orthogonality#
We repeat the previous test using our own implementation of Householder.
A = np.array([[0.70000, 0.70711],
[0.70001, 0.70711]])
Q, R = house(A)
print('Householder QR = ', np.abs(Q.T @ Q - np.eye(2)).max())
Q, R = mgs(A)
print('Modified GS QR = ',np.abs(Q.T @ Q - np.eye(2)).max())
Householder QR = 1.1102230246251565e-16
Modified GS QR = 2.301438892308072e-11
Rank deficient case#
m, n = 4, 4
A = 2 * np.random.rand(m,n) - 1
A[:,2] = A[:,0] + A[:,1] # 2nd col = 0th col + 1st col
Q, R = house(A)
print("Q = \n", Q)
print("R = \n", R)
Q =
[[-0.3614825 0.62929573 0.00535279 -0.68795976]
[-0.6491394 0.18892249 0.52405008 0.51797475]
[-0.60537991 -0.2416831 -0.75285746 0.09115971]
[-0.28541829 -0.71406194 0.3981815 -0.50010343]]
R =
[[-1.25800549e+00 1.16899179e+00 -8.90137024e-02 -9.66576646e-02]
[ 0.00000000e+00 6.41501346e-01 6.41501346e-01 3.04786833e-01]
[ 0.00000000e+00 0.00000000e+00 1.11022302e-16 -1.08460348e+00]
[ 0.00000000e+00 -1.11022302e-16 0.00000000e+00 4.48415877e-01]]
Note that R[2,2]
is almost zero.