Linear Least Squares Solution Methods

The method for solving a nonsingular linear system of equations covered before was LU factorization That won't work directly on LLS because the matrix A is now rectangular (and so, it cannot be nonsingular), and in particular for the overdetermined case in real applications, no vector x exists that solves Ax = b (although there is always at least one that minimizes the two-norm ||Ax − b||).

However, two common ways of solving LLS use similar lower/upper triangular factorization on derived systems. The version of LU factorization beaten to death in earlier notes was for general nonsymmetric n×n matrices. When A is symmetric, only half of the matrix needs to be stored, and the factorization is modified to give a lower triangular matrix L such that

A = LDL^T

where D is block diagonal with 1 x 1 and 2 x 2 blocks. Unsurprisingly this is called the LDL decomposition or LDL factorization.

In scientific and engineering computations, the most common special case for a symmetric matrix is when it is positive definite, defined as

d^T A d > 0

for all nonzero n-vectors d. The reason this is common is because such linear systems arise in optimization. In first semester calculus, a function is minimized by taking its first derivative, setting it to zero, and solving for x. The Second Derivative Test says it is a minimum if the second derivative at that x is positive. For multidimensional calculus, the derivative is vector and the second derivative is a matrix. In this case the Second Derivative Test is exactly the definition of positive definite.

A symmetric positive definite (spd) matrix has a Cholesky factorization

A = LL^T

where the diagonal entries of L are all positive values. Obviously (to me at least) the matrix L here is not the same as the matrix L in the LDL^T factorization. However, the LDL factorization of a spd matrix will give a diagonal matrix D with all diagonal entries positive numbers. In that case, let S = sqrt(D) be the diagonal matrix with the positive square roots of the diagonal of D, then define E = L*S. Then A = EE^T is the Cholesky factorization of A.

Having a Cholesky factorization is also a necessary and sufficient condition for the real valued matrix A to be symmetric and positive definite. In fact, it seems the best way to numerically determine if a matrix is spd.

If the d^T A d > 0 condition is relaxed to only requiring d^T A d ≥ 0 for all nonzero d, the matrix A is positive semidefinite. In this case the LDL factorization exists, but some of the diagonal entries of D are equal to 0.

Normal Equations Method

The easiest method is to solve the normal equations

A^T A x = A^T b

Even though A is rectangular, A^T A is square and for overdetermined LLS it is nonsingular. A^T A is also symmetric (easy to prove) and positive definite (positive semidefinite is easy to prove for any matrix A, and positive definite for when A is full-rank).

The algorithm for solving LLS this way is

Compute G = A^T A
Compute the Cholesky factorization G = LL^T
Compute g = A^T b
Solve Ly = g for y
Solve L^Tx = y for x

Mathematically, in infinite precision arithmetic you're done. Computationally, however, the normal equations can take a perfectly reputable and well-conditioned matrix A and then work with a catastrophically ill-conditioned matrix A^T A. Even the first step of forming G (computing its entries by multiplying out A^T×A can give you 0 digits of accuracy. Normal equations are commonly used as stopping tests, but have long been deprecated in favor of other solution methods ... to follow.

Augmented System Method

Solve (via LU factorization or its variants for symmetric matrices) the augmented system with block 2×2 coefficient matrix:

[ I      A] [r]     =      [b]
     [A^T      0] [x]     =      [0]

The augmented system is simply a rewriting of the normal equations together with the definition of the residual vector r = b − Ax. The block 2×2 coefficient matrix is symmetric, but guaranteed to not be positive definite. Still, the LDL^T factorization can be used. Unfortunately the linear system is now of order (m+n)×(m+n), significantly larger. How much? You can answer that one:

What is the size of the identity matrix I and the zero matrix 0 in the augmented system? Knowing that A is m×n is enough to figure those sizes.
Use the ballpark figures from class: m = 6000000 and n = 200000. Then see how much storage (in Gbytes) is needed for
1. the normal equations matrix G
2. the augmented system coefficient matrix
Suppose that m = 30*n for more general n. What's the largest problem size n that could fit into 8, 32, or 64 Gbytes of memory?

The augmented system doesn't really require a full 8*(m+n)² bytes to store - in many cases advantage can be taken of its special structure. But get used to estimating what are the limits of each of these approaches, and relating them to available hardware.

Next: QR factorization via Givens rotations

Started: Tue 02 Nov 2010, 05:22 PM

Modified: Tue 09 Nov 2010, 10:38 AM to remove cruft

Last Modified: Mon 04 Nov 2019, 07:18 AM