The nonzero entries A(i,j) = 10*i + j are chosen to make it easier to see where the entry is located; the first digit gives the row number and the second one gives the column number. For the problems and iterative methods we will use, the values are actually double precision floating point numbers. Links back to this example are in each data structure section, to make it easier to return to it. I recommend that you open two windows (or at least two tabs) to keep the example in view while reading about each data structure.
The first three of the data structures below require two 1D arrays of integers and one 1D array of doubles. In the tables displaying those arrays, to make it easier to keep track of the correct indices an extra table row proceeds them, in a greenish color. It is not part of the data structures, just a convenience similar to having column headers in a spreadsheet. The tables displaying the arrays of integers and doubles are split in COO and CSR, in case you have a narrow browser window. The names of the arrays used below are mostly standard, although the three (rowptr, col, val) in CSR are more commonly named (ia, ja, a). In the text empty parentheses are appended to "col" and "val" to emphasize that they are arrays. When those terms are embedded in text with a specific index (e.g., col(k)) , or are written as a triplet (row, col, val) the parentheses are omitted. So "col( )" and "val( )" just means they are arrays, not functions.
The example matrix above has nnz = 21 nonzeros, but some arrays are shown with an unused nnz+1 = 22nd entry. In practice the arrays are allocated with only nnz, not nnz+1, components. For debugging purposes only, you might allocate nnz+1 entries instead and then set row(nnz+1) = col(nnz+1) = -5 and val(nnz+1) = NaN; that will cause your code to stop with an error condition if you accidentally run off the end of the array. If you do this, be sure to change the codes back to using only nnz entries before handing anything in.
COO features:
Initialize y(1:n) = 0In two places the loop requires indirect addressing, for y(row(k)) and x(col(k)). [Two instead of three, because although y(row(k)) is both read and written, its address only needs to be computed once per iteration.] Indirect addressing in innermost loops is deadly for performance.
for k = 1 to nnz
y(row(k)) = y(row(k)) + val(k)*x(col(k))
end for
For the example matrix listed above, the
matrix would be stored as
| Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| val | 11 | 12 | 14 | 22 | 23 | 25 | 31 | 33 | 34 | 42 | 45 |
| row | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | 4 | 4 |
| col | 1 | 2 | 4 | 2 | 3 | 5 | 1 | 3 | 4 | 2 | 5 |
| Index | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| val | 46 | 55 | 65 | 66 | 67 | 75 | 77 | 78 | 87 | 88 | - |
| row | 4 | 5 | 6 | 6 | 6 | 7 | 7 | 7 | 8 | 8 | - |
| col | 6 | 5 | 5 | 6 | 7 | 5 | 7 | 8 | 7 | 8 | - |
Don't forget: in the table, the row labeled "Index" is for convenience, and is not part of the data structure. Also, for the example matrix the three arrays (row, col, val) in COO storage would be of length 21, not 22. Although the arrays show the nonzeros in row order, in practice the nonzeros could be placed in any order - row after row, column after column, or even randomly.
CSR stores the column indices and values just like COO, but entries
must be listed one row after another. Doing that reduces the required information
about row indices to just knowing
where a row begins in the column index and value arrays col( ) and val( ).
This is done using an integer array of length n+1, such that the k-th entry
gives the index in col( ) and val( ) where the k-th row begins. That accounts for the
first n entries in the row index information array (which is usually called
"row pointers" or in the tables below, rowptr( ) ).
The last entry, rowptr(n+1), has the value nnz + 1. That allows the loops
implementing a matrix*vector product without having a test for the special
case where the last row is being used, as shown below.
Beware that the term "row pointer" is common, but what the integer array row( ) really contains are the indices (usually 1-based) of where each row starts in the arrays col( ) and val( ). It does not contain a pointer in the usual C/C++/Fortran sense.
Features:
Initialize y(1:n) = 0 for i = 1 to n for k = row(i) to row(i+1)-1 y(i) = y(i) + val(k)*x(col(k)) end for end forWhat can you say about the efficiency of the matrix-vector product (pipelining, data reuse ...)?
For the example matrix at the top
of the page, the CSR arrays are:
| Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| val | 11 | 12 | 14 | 22 | 23 | 25 | 31 | 33 | 34 | 42 | 45 |
| col | 1 | 2 | 4 | 2 | 3 | 5 | 1 | 3 | 4 | 2 | 5 |
| rowptr | 1 | 4 | 7 | 10 | 13 | 14 | 17 | 20 | 22 |
| Index | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| val | 46 | 55 | 65 | 66 | 67 | 75 | 77 | 78 | 87 | 88 | - |
| col | 6 | 5 | 5 | 6 | 7 | 5 | 7 | 8 | 7 | 8 | - |
The arrays val( ) and col( ) above have been split into two separate tables with the rowptr( ) array between them to make it easier to see how the rowptr( ) array of indices matches up. As a simple example, the fourth row of the matrix begins with the index row(4) = 10 and ends with the index row(4+1) - 1 = row(5) - 1 = 13 - 1 = 12. What would the entries in row( ) be if the fourth row was all zeros and none of them are explicitly stored? Will the matrix-vector product pseudo-code still work correctly?
The first n entries in the integer array bindx stores the indices where each row's off-diagonal entries begin. The n+1 entry contains nnz + 1 so that (as with CSR) the matrix-vector product can be written with a single nested loop and no special cases handled for the last row of the matrix. The array of values stores first the n diagonal entries, then skips a single unused entry in position n+1, followed by the off-diagonal entries, row-by-row.
For the example matrix at the top
of the page, MSR storage is:
| Indices | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| bindx | 10 | 12 | 14 | 16 | 19 | 19 | 21 | 23 | 24 | 2 | 4 | 3 | 5 | 1 | 4 | 2 | 5 | 6 | 5 | 7 | 5 | 8 | 7 |
| val | 11 | 22 | 33 | 0 | 55 | 66 | 77 | 88 | -- | 12 | 14 | 23 | 25 | 31 | 34 | 42 | 45 | 46 | 65 | 67 | 75 | 78 | 87 |
You should try to write what the sparse matrix-vector multiplication looks like in MSR.
Something to beware of with COO, CSR, and MSR: all three of these were originally used with Fortran, and hence the indices into the arrays are sometimes 1-based. However, many modern packages are written in C/C++, so sometimes 0-based indexing is used. To make it even more confusing, to maintain compatibility with Fortran drivers, some C/C++ packages use 1-based indexing. You need to carefully check the documentation, and sometimes even the source code, to determine which is used.
Sometimes a C++ package will assume the indexing is 1-based, and shift the integer arrays accordingly to use the more convenient 0-based C indexing internally. Then the package shifts indices back to 1-based before returning to the calling program unit. Others have the user pass in a flag that indicates whether or not the data structure is using 0- or 1-based indexing. Most of the recent ones allow the data to be input with either 0 or 1 based indexing, and the user provides a flag indicating if the index is "C style" or "Fortran style". RTFM (which I believe is the acronym for "read the fine manual").
Initialize y(1:n) = 0 for diag = -p to q for k = max(1,1-diag) to min(n,n-diag) y(k) = y(k) + val(k,diag)*x(k+diag) end for end forFor the example matrix the storage is in the 2D array of dimensions (p+q+1) x n:
| Diag | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
| -2 | * | * | 31 | 42 | 0 | 0 | 75 | 0 | |
| -1 | * | 0 | 0 | 0 | 0 | 65 | 0 | 87 | |
| 0 | 11 | 22 | 33 | 0 | 55 | 66 | 77 | 88 | |
| 1 | 12 | 23 | 34 | 45 | 0 | 67 | 78 | * | |
| 2 | 0 | 0 | 0 | 46 | 0 | 0 | * | * | |
| 3 | 14 | 25 | 0 | 0 | 0 | * | * | * |
The first row and column (labeled "Diag") in the table are not part of the storage scheme, and are just for convenience in reading the table. Note that the numbering of the diagonals, with the main diagonal as zero, subdiagonals with negative numbers, and superdiagonals positive numbers, matches what MATLAB does in its tril( ) and triu( ) operators.
Entries marked with an asterisk (*) are nonexistent ones. For example, the (1,1) entry in the array above would correspond to the (1,-1) entry in A, which does not exist. Similarly the (6,8) entry in the array would be the A(8,11) but A is only 8x8. Any algorithm should not reference those entries, or require them to be set to zero. That is the reason for the max( ) and min( ) operators in the loop controls in the matrix-vector product. It also provides an easy check for some indexing errors: set those entries to NaN, and set the compiler flags so that encountering NaN will trigger a stopping error.
Three arrays are required: val(1:nenv) for nonzero values, rowptr(1:n+1) for row pointers, and fstcol(1:n) for column number of first nonzero in each row (fstcol is shorthand for "first column"). Matrix-vector products become
for k = 1:n y(k) = 0 xcol = fstcol(k) istart = rowptr(i) iend = rowptr(i+1)-1 for j = istart:iend y(k) = y(k) + val(j)*x(xcol+j-istart) end for end forA key feature is that the innermost look does not have any indirect addressing, meaning that it can generally run faster than if COO or CSR are used.
For the example matrix, skyline storage is :
| Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | … |
| val | 11 | 12 | 0 | 14 | 22 | 23 | 0 | 25 | 31 | 0 | 33 | … |
| 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | … |
| 34 | 42 | 0 | 0 | 0 | 45 | 46 | 55 | 65 | 66 | 67 | 75 | … |
| 24 | 25 | 26 | 27 | 28 |
| 34 | 42 | 0 | 0 | 0 |
| rowptr | 1 | 5 | 9 | 13 | 19 | 20 | 23 | 27 | 29 |
| Fstcol | 1 | 2 | 1 | 2 | 5 | 5 | 5 | 7 |