QR Factorization and Linear Least Squares Problems

The methods given for solving nonsingular linear systems of eqations relied upon factoring the coefficient matrix into a product of simpler matrices for which it is easy to solve a linear system. The LU factorization of a matrix A was

A = PTLU

where P is a permutation matrix and L and U are triangular matrices. For those three it was trivial to solve a linear system. For least squares the normal equations can be used, but those are numerically inaccurate.

The modern approach is to factor the rectangular m × n coefficient matrix A into a product of diagonal, triangular, and orthogonal matrices. Orthogonal matrices are important in least squares methods, because multiplying a vector by an orthogonal matrix does not change its two-norm. The first factorization method uses the QR factorization A = QR, where Q is orthogonal and R is upper triangular. Don't forget: orthogonal implies that Q is square, and it is easy to solve a linear system with an orthogonal coefficient matrix since its inverse is its transpose: Q-1 = QT. Given that A is m × n and Q is orthogonal, what are the dimensions of R? Do not proceeed further until you have answered that question. Keeping track of the sizes of the vectors and matrices in least squares will circumvent half of all bugs in both your code and your thinking.

In theory the QR factorization is not always guaranteed to exist without some additional work. In practice, the QR factorization seems to work in most applications that need a LLS solution.

One issue in analyzing QR factorization is that the matrix Q is almost never explicitly formed or stored as a dense matrix. Instead Q is represented implicitly, mostly by data stored in the subdiagonal part of the array A. A load/store analysis needs to reflect that when defining "the minimal amount of data needed to store Q".


Short summary of orthogonality

For m ≥ k, an m × k matrix Q has orthonormal columns if the set of its columns {q1, q2, ... , qk} forms an orthonormal set of vectors. In this case, Q need not be square. Also QTQ = Ik, but unless k = m, QQT ≠ Im.

Orthogonal matrices are square matrices with orthogonal columns, and so they satisfy QTQ = QQT = I. This means that QT is also an orthogonal matrix, and is the inverse of Q. Furthermore, multiplying by orthogonal matrices does not change the two-norm of vectors or matrices. So if the matrix Q with columns qi (i.e., Q = [q1 q2 … qn]) has qiTqj = 0 for i ≠ j and || qi || = 1 for i = 1:n, then || Qd || = || d || for any n-vector d.

More generally, if Q is m × n with m ≥ n and Q has orthonormal columns, then || QTv || = || v || for any vector v of length m. Notice that the two vectors in the last equation are of different lengths; it only says they have the same norm.

In particular, applying this to the LLS residual vector r = b - Ax gives

minx || Ax − b ||2 = minx || QT(Ax − b) ||2

With LU factorization the goal was replacing A with the product of two triangular systems L and U, making the problem easy to solve in two steps, solving a triangular system on each step. For least squares the game to play is to choose an orthogonal matrix that makes QTA have a useful form; it turns out that Q can be built so that QTA = R is an upper triangular matrix. Unfortunately the notation has shifted: for LU, U was the upper triangular system and now R is being used for the upper triangular system. That notation is deeply embedded in linear algebra now, so these notes will also use R. One advantage of using a different symbol: in LU factorization U is square, while in least squares R is rectangular.


But how do you get R?

First notice that if W and V are orthogonal, so is their product W × V [as well as the products WT × V, WT × VT, W × VT, VT × W, VT × WT, and V × WT, so it is OK to go hog wild and tranpose and commute all you want]. Not all of those products are orthogonal matrices (and some are not even defined) if Q is not square, even if Q has orthonormal columns.

So we need a sequence of orthogonal matrices to zero out the subdiagonal entries in A, similar to the way that a sequence of row operations were used to zero out the subdiagonal entries in LU factorization. Then just multiplying them all together (in the right order and transposed where needed) will give a single orthogonal QT. In practice Q itself rarely actually needs to be explicitly created, a signficant savings in time and flops. More importantly, avoiding the full Q gives a vast savings in memory since m >> n.

QR factorization: The goal is to factor A = QR, where Q is an m by n matrix with orthonormal columns and R is upper triangular. When A has full column rank, x* = R−1 QT b solves the least squares problem ... do some side scribbling and convince yourself of that. A vector x solves the least squares problem if and only if it satisfies the normal equations ATAx = ATb, so that's a good way to check the claim.

Be wary of a possible naming confusion. The QR factorization is for solving least squares, but there is also a QR algorithm for computing eigenvalues. The QR algorithm is like a version of the power method on steroids, one that keeps a full set of approximate eigenvalues on each step instead of isolating a single eigenvalue/eigenvector pair. Because some publications call the computation of the QR factors "the QR algorithm" the opportunity for confusion is large. Just checking if it is in the context of least squares or of eigenvalue finding does not always pin down which is being referred to, because the QR algorithm repeatedly performs a QR factorization and then reverse multiplies the two factors: R×Q.

If you do need a real eigenvalue solver for the full decomposition of a dense matrix, plenty of software is available for this and other linear system computations if your problems exceed local capabilities.


Algorithm for QR Factorization

The following details of how to compute a QR factorization are rudimentary; see Golub and Van Loan for more details and practical algorithms. The following material is valuable mainly as another way of seeing how to take a basic algorithm which is naturally stated using BLAS-1 operations, and progressively move it to higher performance BLAS-2 and BLAS-3 versions.

The goal of any QR factorization is to introduce zeros in subdiagonal part of A, using orthogonal transformations. The two common ways of doing this are

  1. Givens rotations which zero out one entry at a time
  2. Householder transforms which zero out an entire column's subdiagonal in one step

Givens rotations: first, consider a vector of length two v = (a, b)T, where a and b are scalars. Let

\begin{displaymath}G = \left[\begin{array}{cc}c & s \\-s & c \\\end{array}\right]\end{displaymath}

so that Gv = (* , 0)T, where * is to be determined. For the Givens rotation to be an orthogonal matrix, c and s must satisfy c2 + s2 = 1. The notation uses c and s because they are the cosine and sine of some angle θ, but θ itself is not needed or computed. To zero out the second component of v ( = the number b), use

c = a/sqrt(a2 + b2)
s = b/sqrt(a2 + b2)

Plug those in and verify that G*v = (*,0) ... I may have the signs wrong in G.

Problem 1: numerically overflow or underflow can occur when squaring either a or b. The solution is to use the (co)tangent t instead of the cosine c and sine s:

If b = 0, set c = 1, s = 0. Otherwise,
    If |b| > |a|, set t = −a/b, s = 1/sqrt(1+t2), and c = s×t
    If |a| > |b|, set t = −b/a, c = 1/sqrt(1+t2), and s = c×t
Then store the single number t instead of c and/or s. Because |t| ≤ 1 always holds, overflow won't occur when computing sqrt(1+t2).

All well and good, provided our vectors are of length 2. For a larger matrix A = [ aij], to zero out aji using aii, let

\begin{displaymath}Q_{ij} = \left[\begin{array}{cccccccccc}1 & & & & & & & & ......& & & \ddots & \\& & & & & & & & & 1 \\\end{array}\right]\end{displaymath}

where the 2 × 2 matrix

\begin{displaymath}G = \left[\begin{array}{cc}c & s \\-s & c \\\end{array}\right]\end{displaymath}

will have G*(aii , aji)T = (*, 0)T. At this point you should know what value the value of |*| is.

Unlike the row operations in LU factorization where one row introduces a zero into a row below it, applying G*A updates both of the rows i and j. Using a temporary array u, the update algorithm is

        u(1:n)   =  A(i,1:n)
        A(i,1:n) =  c*u(1:n) + s*A(j,1:n)
        A(j,1:n) = −s*u(1:n) + c*A(j,1:n)
Doing entire QR factorization requires zeroing out all of the subdiagonal entries in A, so loop over the rows and columns, annihilating one entry at a time:
        for k = 1:n
            for i = k+1:m
                compute c, s to zero out A(i,k) using A(k,k)
                Let Q = the Givens matrix given by c, s, i, k
                A = Q*A
            end for
        end for
Note that computing  A = QikA has some computational savings because one of the two rows will be guaranteed to have zeros in some of its positions, and all the other rows of A are unchanged.

Problem 2: A Givens rotation can introduce only one zero at a time into the matrix. While this is an advantage for sparse matrices (ones consisting mainly of zeros anyway), it is slow for dense matrices.

Problem 3: Applying a Givens matrix to A involves two operations similar to a saxpy: v = αv + βw. The two BLAS operations for doing this are called drotg() and drot(), which compute and apply a Givens rotation, respectively. However, since this decomposition is essentially a matrix-matrix operation (A = QR) a higher level BLAS version should be possible. drotg( ) and drot( ) are level-1 BLAS because they only operate on vectors. The next step is to create a QR factorization that uses level-2 BLAS, and eventually a level-3 implementation.

Next: QR factorization via Householder reflectors


  • Updated: 04 Dec 2008
  • Modified: Mon 23 Aug 2010, 06:00 PM to merge multiple versions
  • Last Modified: Mon 04 Nov 2019, 07:19 AM