Gaussian Elimination (LU Factorization) and the BLAS
The most commonly used direct method for solving general
linear systems is Gaussian elimination with partial pivoting,
which in modern terms is called LU decomposition with pivoting, or
LU factorization with pivoting.
You probably learned Gaussian elimination by adjoining the right
hand side vector b to the matrix, then performing row combinations
that would zero out the subdiagonal entries of the matrix A.
LU decomposition (or "LU factorization") does the
same operations, but ignores the right hand side vector until
after the matrix has been processed. In particular:
Let A be an n x n matrix, nonsingular.
Gaussian elimination with partial pivoting gives the
factorization PA = LU, where
- P is a permutation matrix, that is, it has
exactly one 1 in each row and column, and zeros elsewhere.
P-1 = PT,
and can be stored as an integer vector of length n.
- L is unit lower triangular matrix (ones on main diagonal,
zero above main diagonal)
- U is upper triangular matrix (zeros below the main diagonal)
Solving Ax = b then becomes LUx = Pb, since PAx=Pb. Three steps
can be taken in solving LUx = Pb.
- Set d = Pb, either
by shuffling entries of b, or by accessing via indirect
addressing using a permutation vector.
In setting d = Pb, we can sometimes overwrite
b with its permuted entries ... depending on how P is represented.
- Solve Ly = d (unit lower triangular system)
- Solve Ux = y (upper triangular system)
We look later in some detail at all three stages. However, note that
the triangular solves can be implemented by overwriting the right hand side
vectors with the solution as we go along,
rather than introducing new vectors y and d.
In that case the three steps look more parsimonious with space:
- Set x = Pb
- Solve Lx = x (unit lower triangular system)
- Solve Ux = x (upper triangular system)
and if the original right hand side vector b is not needed later on,
x can be replaced with b in all three steps.
The classical algorithm you probably learned (and need to unlearn) follows
a three phase sequence on each step: find the pivot, interchange rows to
bring the pivot to the diagonal position, scale the subdiagonal entries
in the column, then update the rest of the matrix.
Carrying out these operations
with overwriting of the array A gives the L
and U factors in the corresponding
parts of A (since L is unit lower triangular, we don't store its diagonal, so
everything fits tidily). Less obscurely, the algorithm without pivoting is:
for k = 1:n-1
for i = k+1:n
A(i,k) = A(i,k)/A(k,k)
end for
for i = k+1:n
for j = k+1:n
A(i,j) = A(i,j) - A(i,k)* A(k,j)
end for
end for
end for
We can tighten up the notation a little by using Matlab notation. Recall
- A(r:s,i) refers to rows r through s in column i,
- A(r,i:j) refers to columns i through j in row r,
- A(r:s,i:j) refers to rows r through s in columns i through j.
Note that (1) is a column vector, (2) is a row vector, while (3) is a 2D submatrix.
Now the algorithm above can be stated more succintly as
for k = 1:n-1
A(k+1:n,k) = A(k+1:n,k)/A(k,k)
A(k+1:n,k+1:n) = A(k+1:n,k+1:n) - A(k+1:n,k)* A(k,k+1:n)
end for
In fact, we can tighten up the notation even more in Matlab using
indexing vectors :
for k = 1:n-1
K = k+1:n
A(K,k) = A(K,k)/A(k,k)
A(K,K) = A(K,K) - A(K,k)* A(k,K)
end for
Notice that this implies Matlab is case-sensitive (it is), and
follows a useful convention: use upper case letters for vectors used
for indexing, and lower case letters for scalars used for indexing.
In either of the last two forms,
this more clearly shows what the operations involved are.
Each iteration is a vector scaling of the subdiagonal part of column k,
followed by a rank-1 update of the trailing part of the matrix.
Graphically, the LU rank-1 update form procedure is:
The rank-1 update stinks as a computational kernel ... check its
memory reference to flop ratio. A more efficient version comes via
a common idea in computer science (where it is called lazy evaluation).
In numerical linear algebra it has a longer history, and is called
"deferred updates". This means apply the updating information to a part
of the matrix only when it is actually needed. The algorithm results by
equating parts of A, L, and U in a partitioned way. In pictures, we have
The diagonal entry in column k of L is known to be one, of course.
Note that by sequencing the finding of g, u', and l as shown above
defines an algorithm which recursively works on the trailing
subblock. The resulting algorithm is
for k = 1:n-1
A(k:n,k) = A(k:n,k) - A(k:n,1:k-1)*A(1:k-1,k)
A(k,k+1:n) = A(k,k+1:n) - A(k,1:k-1)*A(1:k-1,k+1:n)
A(k+1:n,k) = A(k+1:n,k)/A(k,k)
end for
The computational kernels are (in order) matrix-vector product,
vector-matrix product, and vector scaling. So the rank-1 update
has been replaced by matrix-vector products, which have half the
memory reference to flop ratio of a rank-1 update. This is a
big win on a cache-based machine.
Minor point performance-wise, but big in terms of correctness:
there is a final update of A(n,n) which has been left off
above. Figure out what it is and correct the algorithm.
Graphically, the LU matrix-vector product form procedure is:
A final note here: the algorithms above (rank-1 and matrix-vector product
formulations) can be applied to an m by n rectangular matrix. If m > n,
then L is lower trapezoidal and U is upper triangular. If m < n, then
U is upper trapezoidal, and L is lower triangular.
Next page: LU Factorization and Required BLAS Operations
- Started: Thu Aug 23 17:10:49 EST 2001
- Modified: Tue Sep 24 08:33:54 EST 2002, to include the Matlab index vector versions.
- Modified: Wed Feb 11 13:34:47 EST 2004, to add the question above ("Figure
out what it is and correct the algorithm.")
- Last Modified: Thu 29 Oct 2009, 12:00 PM