Numerical Properties of LU Factorization


Deriving error bounds for LU factorization is not done in P573. That is the task of a numerical analysis course instead. The practical computational implications of the error analyses is stressed here.


Matrix Norms

Expressing the error properties of LU factorization uses matrix norms. Let e be machine precision; for 64-bit double precision numbers this means e ≈ 2.23 × 10−16.

Recall the various norms of a n-vector x:

The norm of a matrix is also defined, but can be more complex. For an m × n matrix A,

As mentioned earlier , ||AB|| ≤ ||A||*||B|| does not hold for all matrix norms. If a norm does satisfy that property, it is called a consistent norm.

The matrix 2-norm is defined using vector norms, and can be equivalently defined that way. More generally, for p = 1, 2, and ∞ define

||A||p = max ||Ax||p / || x ||p

for all x ≠ 0. Heuristically ||A||p is the maximum amount by which A can "stretch" a vector. Slight digression: the p-norm for both vectors and matrices can be defined for any p such that 1 ≤ p ≤ ∞. So there are an uncountably infinite number of different norms. Norms defined as above are called operator norms in functional analysis.


Condition numbers

LU factorization was the original motivation for numerical analysis back before Egypt built the pyramids (or around then). Results generally use the condition number k(A) of A, defined for nonsingular matrices as

k(A) = ||A|| ||A−1||

Notes about k(A):


LU factorization error analysis: perturbation result:

For this error bound, a perturbation to a matrix A is denoted ΔA. This occurs as a result of representing real numbers as finite precision machine numbers, or from non-exact arithmetic operations. ΔA is a single matrix, not the product Δ*A (or the Laplace operator applied to A). So think of ΔA as a small change in the matrix A, of order 10-16 in double precision arithmetic. The same holds for Δb as a small perturbation of the vector b.

Let x be the exact solution to Ax = b, where A is nonsingular and b is given. If ||ΔA|| || A−1 || = r < 1, then the perturbed system (A + ΔA)y = b + Δb is nonsingular with a unique solution y . Furthermore, if h is chosen so that if ||ΔA|| < h || A || and ||Δb|| < h || b ||, then

|| x − y || < 2 h k(A) ||x|| / (1−r)

Big Fat Hairy Deal.
An equation (OK, inequality) like this isn't worth diddly-squat unless you can understand and use it. So mentally, let's take this apart: k(A) ≥ 1, and is much larger than 1 for an ill-conditioned matrix. The term r is the product of the perturbation of A and the norm ||A−1||, and that norm measures how much the inverse of A can stretch a vector. Requiring r < 1 implies that the perturbation is small enough to overcome any tendency of A−1 to blow up a vector it multiplies. If r is small enough, the denominator 1-r is close to 1 and can be neglected. h bounds the relative sizes of the perturbations (||ΔA|| and ||Δb||) to the quantities (||A|| and ||b||). So the perturbation result gives an upper bound on

|| x − y || / |x|| ,

which is the relative error in the computed solution y from the exact solution x. If things aren't too perverse, that bound boils down to 2 h k(A) .

So the perturbation bound depends on k(A), and if that is large then the relative error in the computed solution may be large — unless the perturbations on the matrix A and right-hand side vector b can be made small to offset k(A).

But Wait, There's More (Part I)
This gives an upper bound on the relative error, but a computed solution may be much more accurate than the inequality implies. In some cases (not LU factorization, however) a computed solution can be accurate to machine tolerance even when the condition number is infinite and the matrix is actually singular. For solving linear systems of equations, "the" error vector is || x − y ||, where x and y are the exact and computed solutions. If the matrix is singular, there is no single exact solution. In this case, the accuracy of a computed or approximate solution can be measured by the residual vector b − Ay. If y nearly solves the linear system, then ||b − Ay|| should be near zero. So the perturbation result is not the final complete One True Bound.
But Wait, There's More (Part II)
The bound above depends on the condition number of the matrix. That in turn requires knowing ||A−1||. How can you compute that without knowing A−1? And if you know A−1, then why not just compute x = A−1*b and not waste time and computation fooling around with LU factorization, triangular solves, etc.?

One answer is that you can get estimates of the condition number from some numerical linear algebra libraries. But in typical Ouroboros fashion, they often base the estimate of ||A−1|| by computing the LU factorization of the matrix A and then dynamically choosing right hand side vectors u to get maximal blow up in the solution v to LUv = u. But if the computed L and U are wildly inaccurate, then doesn't that mean the estimated ||A−1|| is also wildly inaccurate? That means some bound needs to be provided not on the computed solution vector, but also on the computed LU factors ....


LU factorization accuracy result:

Define |G| as the matrix G with all of its entries replaced by their absolute values (the Matlab function abs(G) does this element by element absolute value). |G| is not a norm, because it returns a matrix, not a real number. Unfortunately some (mostly older) textbooks and papers use the notation |G| for the norm of G. In these post-pyramidal times, that notation is mostly deprecated, but beware of it in publications, textbooks, or online materials for other courses.

Let ε be machine tolerance; for double precision arithmetic this is approximately 2.2 × 10−16. Let L and U be the factors computed by any of the algorithms covered, and let x be the solution computed using the permutation matrix and the two triangular solves. Then (A + E) x = b, where

|E| ≤ n ε (3|A| + 5|L| |U|) + O(ε2)

This is a different creature than the perturbution result.

Big Fat Hairy Deal Redux
Again, try to make sense of the inequality instead of treating it as a dead skunk to be routed to someone you don't like. This bound has a factor of n, so if the linear system is of order billion = 109, potentially 9 significant digits are lost. [Linear systems of order 1010 and larger are solved routinely in scientific computing]. The factor ε ≈ 10−16 compensates for this enormous growth of the error bound, but the numbers mean potentially a computation with double precision numbers and arithmetic will give results with only 8 significant digits, about the same as single precision floating point numbers.

Even if E is small, that does not mean the computed solution x is close to the exact solution. If the condition number of A is large, that means the matrix is close to a singular matrix. That in turn means A+E could be singular, and the computed x may have zero correct digits. This is the distinction between the condition of a problem and the stability of an algorithm and its implementation.

The good news is: But that would be an astonishing stroke of luck, vastly more unlikely than buying a winning lottery ticket every day, until the heat death of the universe brings it all to a stop.