Basics of Linear Algebra


This is not an exhaustive review, but instead is primarily a listing of some terminology that we need to use. We are not concerned about general vector spaces, so we can restrict attention to the space Rn, the set of n-vectors with elements drawn from the reals. By default, x in Rn is a column vector (an n × 1 matrix). If we need a row vector it will be indicated by transposing a column vector: xT. In Matlab a transpose is indicated by a single quote mark, e.g. x'. What if x is a complex vector in Matlab? See what x' is then. All of the vectors and matrices used in P573 will be real-valued unless otherwise explicitly stated.

Vector Norms

There are several "norms" or measures of vectors. The reason for specifying these as vector norms is because there are also matrix norms, defined later in this page. For a vector v with components v1), v2), ..., vn), the three most common norms used are

In Matlab, those vector norms are given by functions For large matrices A, avoid using the 2-norm and alway use any other norm. For square matrices A, the two-norm is the largest absolute value of the eigenvalues of A. This can be expensive (in time and computation) to compute.

Linear Dependence

Let x1, x2, ..., xm be vectors in Rn, and let a1, a2, ..., am be real numbers. Then the vector


y = a1 x1 + a2 x2 + ... + am xm

is called a linear combination of the vectors; the scalars ai are called the coefficients of the linear combination. If all the ai are zero the linear combination is the trivial one. A set of vectors x1, x2, ..., xm in Rnare linearly independent if the only linear combination of them that adds up to zero, is the trivial one. Otherwise they are called linearly dependent. If a set of vectors x1, x2, ..., xmis linearly dependent, there is a nontrivial (i.e., not all equal to 0) set of scalar coefficients a1, a2, ..., am such that

a1x1 + a2 x2 + ... + am xm= 0

In the above equation, the ai are scalars and the xi are vectors. [This violates the nomenclature convention discussed in class, mainly because most people don't have Greek letters supported by default in their browsers.] At least one of the coefficients is nonzero. As an example, suppose that a2 ≠ 0. Then the linear dependent property can be written as

x2 = -( a1/a2 x1 + ... + am/a2 xm)
This says the vector x2 contributes no additional information as far as linear combinations are concerned, because it can be represented using some linear combination of the other vectors. Keep in mind this information theoretic point of view; it often enables you to make sense of different linear algebra properties and operations.

Subspaces

S is a subspace of Rn if x+y and a*x are in S whenever a is a real scalar and x and y are vectors in S. A subspace can begenerated by a set of vectors V, by defining S to be the set of all possible linear combinations of vectors in V. A basis for a subspace S is a set B of vectors in S such that B is linearly independent, and B generates S. For a given subspace S, any two bases B1 and B2must have the same number of vectors; that number is the dimensionof the subspace.

Typically we stack together the vectors that form a basis of a subspace, putting them as columns of matrix. If the entire space is of dimension n, and the subspace is of dimension k, that creates an n × k matrix B. This gives a shorter way of writing the linear dependence relation


y = a1 x1 + a2 x2 + ... + am xm

by using the matrix with the basis vectors as its column:

y = B*a

The vector a in y = B*a is defined by a = (a1 , a2 , ... , ak). ( Slight digression: sometimes I will denote multiplication by juxtaposition, e.g. Ba. Other times, particularly when emphasis is needed on the multiplication operation, I'll use B*x.) This gives another good heuristic: a matrix * vector product is just a succint way of expressing a linear combination.

Matrices

An m × n matrix is often defined as a rectangular array of numbers with m rows and n columns. Extreme danger lurks here for computer scientists. The problem is that a matrix is a mathematical entity, while an array is a computer data structure that can be used to hold the matrix. So get it straight now: a matrix can be stored in any data structure (linked list, tree, two queues, etc). Except in some digital signal processing applications, matrices are always indexed from 1 by scientists, mathematicians, and other creatures that roamed the earth centuries before computer science was invented. However, a 5 × 3 matrix that is indexed starting from 1 can be stored as a 2D array with rows indexed from 0 to 4 and columns indexed from 0 to 2. Or it can have rows indexed from -17 to -13 (at least in Fortran or Blitz++), or be stored in a 1D array indexed from 0 to 14, or stored in a graph data structure, .... etc.

Structured Matrices

The term "structure" refers to many things in linear algebra, but the most common use in numerical computing involves the nonzero structure of the matrix. The reason we concentrate on the "nonzero pattern" is because many of the matrices that occur in solving partial differential equations (PDEs) and ordinary differential equations (ODEs) have all but a few of their entries equal to zero. Terms you should know include Other terms that are commonly used include A symmetric matrix A is one where Aij = Aji or equivalently, A = AT.

Although it sounds petty, being able to index from any given starting integer is extremely helpful in coding some numerical linear algebra operations. For example, a "banded matrix" is typically stored by diagonals (instead of by rows or columns). By convention, the main diagonal is numbered as the 0-th diagonal, the first superdiagonal is numbered as the 1-th diagonal, etc. The first subdiagonal is numbered -1, the diagonal below that is numbered -2, etc. It is much easier to read and debug a code using banded matrices if the arrays storing the matrix values can be indexed from -n to n.

Submatrices

Given a matrix A and a set of row/col indices {i1, i2, ..., ik}, the corresponding principal submatrix of A is the k × k array consisting of elements with row and column numbers in that index set. A leading principal submatrix has i1 = 1, i2 = 2, ..., ik = k, for some k. A trailing principal submatrix is similarly defined, for the last k possible indices. Submatrices are also obtained by partitioning a matrix. Often it is useful to consider an m × n matrix A as a row vector of n column vectors, each from Rm:
A = [a1 a2 ... an]
and other times it is convenient to consider A as a column vector of m row vectors, each from Rn:
A = [ a1T
         a2T
          .
          .
          .
         anT]
Note that for a given matrix A, the ai from a row partitioning is different from that given by a column partitioning (when will they be the same?). Also, more generally a matrix can be partitioned into blocks, where each block is itself a matrix.

Matrix Operations

Adding matrices, multiplying them by a scalar, and similar operations are easily defined in linear algebra. Make sure that you are familiar with multiplying matrices: in the operation C = A*B, the number of columns of A must equal the number of rows of B. If A is m × n and B is n × k, then C is m × k and its (r,c) element is the sum from i = 1 to i = k of Ari*Bic. That is, the rth row of A is multiplied times corresponding elements in the cth column of B. This holds even when Ari, Bic, etc. are subblocks of A and B rather than just real numbers. Note the special case of y = Ax, where A is m × n and y is m × 1, x is n × 1. Then by looking at A partitioned by columns, the equation y = A*x expresses y as a linear combination of the columns of A; the ithcolumn as coefficient xi, the ith component of the vector x. Transpose: if A is an m × n matrix, then AT is the n × m matrix with element (i,j) the same as the (j,i) element of A. Recall that
(AB)T = BT AT

There are four special cases of matrix multiplication that we'll spend a lot of time with.

  1. In C = A*B where A is 1xn and B is n × 1, then C is a scalar and the operation is called a dotproduct, inner product, or the scalar product of the vectors A and B.
  2. In C = C + A*B, if A is n × 1 and B is 1 × 1, then C is an n-vector and the operation is called a saxpy or daxpy operation. It is most commonly formulated as y = y + alpha*x, where x and y are n-vectors and alpha is a scalar.
  3. In C = C + A*B, if A is m × 1 and B is 1 × n, the operation is called a rank-one update of C. The reason is because the rank of the matrix being added to C is equal to one. The product A*B in this case is called the outer product of A and B. Any matrix with rank 1 can be written as the outer product of two vectors. Rank-one updates will be common in solving linear systems numerically.
  4. If if A is m × n and B is n × 1, the operation is called a matrix-vector product.

More Subspaces

The range of an m × n matrix A is the set of all possible linear combinations of the columns of A. Another way to express this is
{y: y = Ax for some vector x}, where braces indicate a set. The range of A, range(A), is a subspace (reality check: it is a subspace of which space, Rm or Rn?) The null space of A is the set of all vectors that A maps to the zero vector: {x: Ax = 0}. Stated otherwise, a vector is in the null space of A if its components give a linear combination of the columns of A that sums up to zero. Note that range(A) and null(A) live in different worlds; one is in Rm, the other in Rn. Also note that if A is square and of order n, then the dimensions of range(A) and null(A) sum to n.

Inverses

The most prototypical problem in numerical computing is to "solve" the equation Ax = b for x, given coefficient matrix A and right-hand side vector b. When A is square n × n and has full column rank, then a unique inverse matrix A-1 exists such that A-1A = I = AA-1, where I is the identity matrix of order n. The solution can be written as x = A-1b. However, in practice you never want to compute or form the inverse matrix A-1 in solving a linear system. Doing so requires more computation and has lower numerical accuracy than Gaussian elimination does. A square matrix is nonsingular if any of the following hold: When A is not square, we typically want a "least squares" solution to Ax = b. That is, find the vector x that minimizes the 2-norm of the vector Ax - b. This is how you experimentally find the (n½, rinf) parameters for the Hockney-Jessup model of a pipelined operation.

Matrix Norms

The norm of a vector gives its length, while the norm of a matrix gives the maximum (relative) amount by which it can stretch a vector:
||A||2 = max{||Ax||2/||x||2, over all nonzero vectors x}.
This may seem to be a circular definition, but it is not. It defines a matrix norm by using vector norms, which are already well-defined.. A can be m × n, so the norm ||A||2 in the numerator is for a vector of length m, while the norm ||x||2 in the denominator is for a vector of length n. Mutatis mutandi you can similarly define the 1-norm and infinity norm of a matrix, but they have an equivalent but easier computation:
This is not a complete definition of matrix norms; with one exception we are going to use the three above, which are examples of operator norms. One norm that is not defined as the maximum of a ratio of vector norms is the Frobenius norm:
Like any norm, matrix norms must satisfy the properties
A property that does not hold for all norms but does for the four matrix norms listed above is

||AB|| <= ||A||*||B||.

[Note the difference between this and the triangle inequality, which deals with the sum (not the product) of matrices.] If the matrix norm does satisfy this additional property, it is called a consistent norm.

Matrix norms are most commonly used in numerical analysis, to measure how close a computed matrix is to the ideal (in the Platonic sense) matrix. For example, a common approach to solving a linear system involves factoring it into the product of a lower and upper triangular matrices: A = L*U. After performing LU factorization the L and U factors are likely to have some errors from finite precision arithmetic. In that case, ||A-L*U|| would provide a measure of how far off the computed factorization is from the exact one. Here, A, L, and U refer to matrices. (Slight digression: in LU factorization the array A is usually overwritten with the combined L and U factors.) A less common use for matrix norms is to define information compression approximations to large operators. That is used in the context of text retrieval or web search engines.

Throughout the remainder of this course, all norms (matrix or vector) will be the 2-norm, unless explicitly stated otherwise. So the subscript 2 will be dropped hereafter.

Orthogonality and Orthogonal Matrices

An orthonormal basis for a subspace is especially convenient to work with. Let {qi, i = 1, 2, ..., k} be an orthonormal basis for a subspace S. The if x is a vector in S, its representation as a linear combination of the qis is easily computed as
x = (q1Tx)q1 + (q2Tx)q2 + ... + (qkTx)qk.

If you create the n × k matrix Q with columns consisting of those vectors qis , then the coefficients of the linear combination are the entries in the vector QTx. Finding the representation in a non-orthonormal basis is much more difficult computationally.

Because orthogonal matrices don't change the (2- or Frobenius-) norm, they are highly stable numerically when multiplying by them. In floating point arithmetic you can multiply a vector by millions of orthogonal matrices and have only mild loss of precision at the end. Multiplying by an orthogonal matrix is sometimes called "applying an orthogonal transformation".

Eigenvalues

Eigenvalues are only defined for square matrices. If there is a scalar s and nonzero vector x such that Ax = sx, then s is an eigenvalue of A and x is the corresponding eigenvector. Even if A has all real-valued entries, it may have complex-valued eigenvalues. Geometrically the relation Ax = sx says that in the direction given by the vector x, the action of A may stretch or reverse the vector by a factor of |s|, but it will not change the vector's direction otherwise. A (square, of course) matrix of order n has n eigenvalues, but those may not be distinct values - some eigenvalues may be repeated with different eigenvectors.
Numerically you will almost never encounter eigenvalues with this kind of "multiplicity"; floating point arithmetic invariably perturbs the array values representing the matrices and vectors, and so the eigenvalues will seem distinct.
When A is not symmetric, the eigenvalues may be complex and the eigenvectors may not form a basis for Rn. This suggests that maybe eigenvalues are not the right thing to look at for nonsymmetric matrices, and there is a better approach, the singular value decomposition, for decomposing a nonsymmetric or even a rectangular matrix.


Next page: LU Factorization