Requirements for Sparse Matrix Data Structures

Sparse Matrix Data Structure Requirements

In general any components of a sparse matrix that are exactly 0.0 should not be stored, but in some cases it is more computationally efficient to allow a few to be included, as with the compressed diagonal storage (CDS) one described below. When the matrix has nnz nonzeros, the number of these additional entries should not exceed O(nnz). The applications considered in this course actually have nnz ∈ O(n) where n is the number of rows and number of columns of the matrix. Furthermore, for the simple applications considered here, nnz ≤ 27n. More generally that is not the case, e.g., in linear programming or solving ordinary differential equations (ODEs) in 1 or 2 spatial dimensions.

Data structures for sparse matrices sometimes require econdary storage (i.e., hard drive or SSD storage) during the computation, and are the major determining factor for computational efficiency via pipelining and data locality. Let n be the number of rows in a matrix A, and let nnz be the number of nonzeros in A. For now, just treat with square matrices - rectangular ones are readily handled similarly, typically with just the cost of storing another integer (the number of columns). We will use matrix-vector product as the kernel to explore different data structures for sparse matrices, since that is the most common operation for methods using them.

User ease

Because of increases in memory sizes available (in 2012, up to 64 Gbytes for a single processor), the data structure that end users interact with need not be the one that is actually stored and used for computations. For libraries that provide sparse matrix capabilities it is common to have an interface that allows users to enter the data readily, then internally the data is converted to a data structure that will be more efficient for computations. Similarly, on output you want to display the matrix in a way that allows users to determine if the matrix is correct.

Fill-in and sparse matrices

Another consideration is fill-in, described in more detail here. When computing the LU or QR factorization of a matrix, a common operation is adding a multiple of one row to another occur. If a multiple of row i is added to row j, entries that are zero in row j but are nonzero in row i will require storing new nonzeros in the modified row j. So one feature of a data structure for a sparse matrix is how easily new entries can be added to existing rows.

Augmenting a sparse matrix by including new rows or columns

If the matrix comes from a computational information retrieval problem as in LSI, each time a new document is added to the collection another column needs to be appended to the text-document matrix. Each time a new term is added, a new row must be added to the matrix - and all the preceding documents (columns) must be searched again to see if the new term appears in them. Adding new documents (especially web pages) is a common operation in LSI. Adding a new term may seem to be rare, but consider using a search engine which has not added terms like "LOLcat", "glitterbombed" or "clickjacking" to the list of terms its web spiders search for. So the ease and efficiency of adding new rows or columns to an existing sparse matrix data structure are important.