where w and d are n×1 vectors. Recall that the linear system being solved is denoted as
which is why different vectors (d and w) are used here. Looking ahead, generally the estimate of the solution vector x is updated on each iteration by some multiple of d, and the residual vector r = b - A*x is updated by some multiple of w. Because the matrix vector product is needed on each iteration, it accounts for the majority of computational time for iterative solvers.
You should be able to easily write the code in one of C, C++, or Fortran that does this multiplication when A is a dense matrix stored in a 2D array. If not, stop reading this until you can do so.
For direct methods (LU factorization, QR factorization) even higher computational rates are possible with frontal and supernodal methods, but those beyond the scope of this course. Those methods allow using BLAS level 2 and sometimes BLAS level 3 methods, with concommittent efficiencies. Implementations and descriptions of those can be found in the packages SuperLU (from Xiaoye Li, Lawrence Berkeley Lab) and UMFPACK (from Tim Davis at the University of Florida). The most general and widely used data structure is compressed sparse column (CSC), and it is the one that many libraries for sparse matrix problems use. With the increase in memory sizes available, most linear solver packages now allow the user to input a sparse matrix in COO format and then convert it internally to a more efficient data structure, often CSR. Before, this was not possible - two copies of the sparse matrix could not be held in memory, so any conversion required complicated in-place methods.