Sparse Matrices and Fill-in

Fill-in and sparse matrices

When computing the LU or QR factorization of a matrix, a common operation is adding a multiple of one row to another occur. If a multiple of row i is added to row j, entries that are zero in row j but are nonzero in row i will require storing new nonzeros in the modified row j. This accumulation of new nonzeros is called fill-in.

When A is stored as a dense 2D n×n array, any potential fill-in space is already provided. For sparse matrices, the data structure must be modified unless the storage and row/column indices are pre-allocated in advance. So one feature of a data structure for a sparse matrix is how easily new entries can be added to existing rows.

Direct Solve of Linear System

Repeating: performing Gaussian elimination (LU factorization) on the sparse systems causes fill-in, nonzeros of L and U appearing where the coefficient matrix A had zeros. These fill-in entries are limited to within the bandwidth of the matrix A (with pivoting, they can appear elsewhere but the storage costs below remain the same). This is worth illustrating. Consider the following 8×8 example matrix A, where a * indicates a nonzero in the coefficient matrix, and a blank indicates a zero.

stargraph matrix

Now consider applying LU factorization without pivoting to the matrix A. At the first step, A(1,1) is used to zero out A(2,1) by adding a multiple of row 1 to row 2. Doing this causes new nonzeros to be introduced in the rest of row 2, in entries A(2,3:n) shown by a red F below:

stargraph, level one fill

The new nonzeros introduced are indicated with a red "F", for fill-in. Next, A(1,1) is used to zero out A(3,1), introducing new nonzeros in entries A(3,2) and A(3,4:n):

stargraph, level two fill

Continuing this process, by the time A(1,1) has been used to zero out A(n,1), the entire array has all nonzeros in it:

stargraph, level n fill

Don't forget that the first column is still nonzero, since it is used to store the multipliers used in zeroing out the corresponding entries. So in this case, a sparse matrix requiring only 3n-2 entries to be stored, has LU factors that require storing n2 entries.

Matrix Reordering

Different ways of ordering the unknowns in the PDE mesh can cause greatly different nonzero structures for the matrix A, and change the cost. In best case, if A is of order n adroit choice of reorderings can reduce cost of storage from O(n2) to O(n). Consider the original coefficient matrix used above to illustrate fill-in. Order the equations so that the first equation appears last, and the last equation appears first. The nonzero pattern now becomes

reverse equations

Renumbering the equations this way moves rows in the matrix - in this case, it interchanges the first and last rows. Next, renumber the unknowns so that the first unknown x(1) becomes the last unknown x(n), and the last unknown becomes the first. Renumbering the unknowns this way causes the columns of A to be reordered, in this case interchanging the first and last columns. The new nonzero pattern becomes

reverse equations

Now perform LU factorization without pivoting on this matrix - what is the resulting nonzero sparsity pattern for the L and U factors? How much fill-in occurs?

A large amount of research has gone into matrix reordering methods - originally to reduce the amount of fill-in, and now to include goals such as getting a sparsity pattern amenable to parallel computation. Matrix reordering corresponds to the labeling of nodes in a related graph from 1 to n. Different ways of labeling the graph nodes leads to different matrix sparsity patterns, and most of the algorithms are formulated in graph algorithm terminology.

Discretized PDE Sparsity Patterns

If a one-dimensional PDE is discretized using standard second order centered differences to approximate derivatives, the resulting linear system is tridiagonal: nonzeros appear only on the main diagonal, the first superdiagonal, and the first subdiagonal. LU factorization can be done with no fill-in occurring, so it requires O(N) storage and work, where N = number of unknowns.

For a regular N x N mesh in 2D, the cost becomes O(N3). Note O(N4) needed when treating A as a full (dense) array. Also, no reordering of PDE unknowns will give a smaller order of storage.

For a 3D uniform mesh, the cost is O(N5). This is the crucial reason 3D is such a challenge to scientific computing. For a 1 Gybte memory machine, the effective limit for 2D problems is a 11000 x 11000 mesh; for 3D it is a 500 x 500 x 500 mesh. That means we can resolve features 22 times more accurately on a 2D mesh than on a 3D one. In practice, modern scientific computing methods don't use uniform meshes, and much research concentrates mesh points in areas of "interest", where gradients are large and the physics of the problem is rapidly changing. However, the dimensionality problem still means 3D is more difficult. Sidenote: a great deal of research also has been done on out of core methods for doing LU factorization and solves on large sparse systems of equations. These methods use sophisticated staging methods to handle problems larger than will fit into memory at one time. However, they are still much slower than problems that (mostly) fit into memory at one time.

The above estimates are for square/cubic domains. Practical problems often involve more complicated geometries for which reordering unknowns may or may not provide reasonable direct solutions. Rule of thumb for direct systems: direct methods are best on most modern workstations if order of A is n <= 12000; for larger machines n <= 20000. Those number will change dramatically as algorithms, implementations, and memory systems improve, however, so treat them as rough guidelines circa 2012. Sparse direct solvers can handle some hugh systems readily, yet choke on systems an order of magnitude smaller. It depends on the fill-in encountered, which is not predictable without first carrying out "symbolic factorization", as was done on the original coefficient matrix above. We just mimicked the operations of LU factorization without actually carrying out any floating point arithmetic, to see where new nonzeros would appear. In general that requires some complex (integer) computation and graph algorithms.

One rule does hold: You should never write a direct solver for linear systems. Use existing packages, because they will invariably do a better job than you can. Some to consider using are:

  1. Linpack: old but good one.
  2. Lapack: modern, more efficient, but much larger package with a more complex API.
  3. Sparspak: an early package that unfortunately requires purchasing a license.
  4. PSE solvers: Matlab's built in backslash operator is efficient and uses latest methods.
  5. SuperLU: solver from UC-Berkeley, now based at NERSC.
  6. UMFPack: solver from Tim Davis (at the Univ. Florida).
For a little money (at least for universities), you can also use commercial packages: NAG, IMSL, MATLAB®, etc.

In general, nobody is really interested in solving linear systems: they want to solve PDE's, optimization problems, circuit simulation, etc. Many software packages for those problems have specialized linear solvers that take advantage of special characteristics from the application. This is particularly the case for iterative methods, and designing effective preconditioners for them.

Some other general scientific computing software to consider using are:

  1. National HPCC Software Exchange (NHSE) which may be defunct now
  2. Netlib Repository at UTK/ORNL, the place to start a search for solvers
  3. The MathWorks Web Site (MathWorks is the company that provides MATLAB)
  4. Guide to Available Mathematical Software


Next page: Sparse Matrix Representations