Requirements for Sparse Matrix Data Structures
Sparse Matrix Data Structure Requirements
In general any components of a sparse matrix that are exactly 0.0 should not
be stored, but in some cases it is
more computationally efficient to allow a few to be included, as with
the compressed diagonal storage (CDS) one described below. When the
matrix has nnz nonzeros, the number of
these additional entries should not exceed O(nnz). The
applications considered in this course actually have nnz ∈ O(n)
where n is the number of rows and number of columns of the matrix.
Furthermore, for the simple applications considered here,
nnz ≤ 27n. More generally that is not the case, e.g., in linear programming or
solving ordinary differential equations (ODEs) in 1 or 2 spatial dimensions.
Data structures for sparse matrices sometimes require econdary
storage (i.e., hard drive or SSD storage) during the computation,
and are the major determining factor for computational efficiency via pipelining and
data locality. Let n be the number of rows in a matrix A,
and let nnz be the number of nonzeros in A. For now, just treat
with square matrices - rectangular ones are readily handled similarly,
typically with just the cost of storing another integer (the number of
columns). We will
use matrix-vector product as the kernel to explore different data structures
for sparse matrices, since that is the most common operation for
methods using them.
User ease
Because of increases in memory sizes available (in 2012, up to 64 Gbytes for a single
processor),
the data structure that end users interact with need not be
the one that is actually stored and used for computations.
For libraries that provide sparse matrix
capabilities it is common to have an interface that allows users to enter the
data readily, then internally the data is converted to a data structure that
will be more efficient for computations. Similarly, on output you want to
display the matrix in a way that allows users to determine if the matrix is
correct.
Fill-in and sparse matrices
Another consideration is fill-in,
described in more detail here.
When computing the
LU or QR factorization of a matrix, a common operation is
adding a multiple
of one row to another occur. If a multiple of row
i is added to row j, entries that are zero in row
j but are nonzero in row i will require storing
new nonzeros in the modified row j. So one feature of a data structure
for a sparse matrix is how easily new entries can be added to existing rows.
Augmenting a sparse matrix by including new rows or columns
If the matrix comes from a computational information retrieval problem as in LSI,
each time a new document is added to the collection another
column needs to be appended to the text-document matrix. Each time a new term is
added,
a new row must be added to the matrix - and all the preceding documents
(columns) must be searched again to see if the new term appears in them.
Adding new documents (especially web pages) is a common operation in LSI.
Adding a new term may seem to be rare, but consider
using a search engine which has not added terms like "LOLcat", "glitterbombed"
or "clickjacking" to the list of terms its web spiders search for.
So the ease and efficiency of adding
new rows or columns to an existing sparse matrix data structure are important.