P573: Linear Algebra Basics

Linear Algebra Basics


This short review is primarily a listing of some terminology needed. In general the vector space used is restricted to the space ℜn, the set of n-vectors with elements drawn from the real numbers ℜ. One obvious exception is the vector space of all m × n matrices with elements drawn from the real numbers. By default, x ∈ ℜn is a column vector (an n × 1 matrix). A row vector is indicated by transposing a column vector: xT. In Matlab a transpose is indicated by a single quote mark, e.g. x'. What if x is a complex vector in Matlab, is it the transpose or Hermitian transpose (AKA the conjugate transpose)? Answer this for yourself: set up a random vector of length n = 5 with complex numbers as its components:
    x = randn(5,1) + sqrt(-1)*randn(5,1)
and then see what x' is:
    x'
[If you want to use Matlab's help facility, doing "help ' " won't work. Instead use "help punct"]

All of the vectors and matrices used in P573 will be real-valued unless otherwise explicitly stated. The only exception is for eigenvalues of nonsymmetric matrices. Sometimes I will denote multiplication by juxtaposition, e.g. Ba. Other times, particularly when emphasis is needed on the multiplication operation, I'll use B*a.


Vector Norms

There are several "norms" or measures of the sizes of vectors. The reason for specifying these as vector norms is because there are also matrix norms, defined later in this page. For a vector v with components v1, v2, ..., vn, the three most common norms used are

In Matlab, those vector norms are given by functions


Linear Dependence

Let x1, x2, ..., xm be vectors in ℜn, and let α1, α2, ..., αm be real numbers. Then the vector


y = α1 x1 + α2 x2 + ... + αm xm

is called a linear combination of the vectors; the scalars αi are called the coefficients of the linear combination. If all the αi are zero the linear combination is the trivial one. A set of vectors x1, x2, ..., xm in ℜn are linearly independent if the only linear combination of them that adds up to zero, is the trivial one. Otherwise they are linearly dependent. If a set of vectors x1, x2, ..., xm is linearly dependent, there is a nontrivial (i.e., not all equal to 0) set of scalar coefficients α1, α2, ..., αm such that

α1x1 + α2 x2 + ... + αm xm= 0

In the above equation, the αi are scalars and the xi are vectors. At least one of the coefficients is nonzero. As an example, suppose that α2 ≠ 0. Dividing everything by α2 ≠, the linear dependent property can be written as

x2 = − (α12 x1 + ... + αm2 xm)

This says the vector x2 contributes no additional information as far as linear combinations are concerned, because it can be represented using some linear combination of the other vectors. This information theoretic point of view is often helpful in making sense of linear algebra properties and operations.

A more concise way of stating a linear dependence relationship is to use matrix notation. Define X as the m × n matrix with the ith vector xi as its ith column. Define the m-vector

a = [α1   α2   ...   αm].

The equation y = α1 x1 + α2 x2 + ... + αm xm can be written as
y = X*a

Linear dependence says some nonzero vector a will give y = 0 = X*a. The advantage of the matrix version goes beyond saving ink and electrons; it is the crucial step in being able too identify BLAS operations and to try replace lower level BLAS with higher level ones. This gives another good heuristic: a matrix*vector product is just a succint way of expressing a linear combination.


Subspaces

S is a subspace of ℜn if x+y and α*x are ∈ S whenever α ∈ ℜ is a real scalar and x, y ∈ S. A subspace can be generated by a set of vectors V, by defining S to be the set of all possible linear combinations of vectors in V. A basis for a subspace S is a set of vectors in S such that the vectors are linearly independent, and they generate S. For a given subspace S, any two bases B1 and B2 must have the same number of vectors; that number is the dimension of the subspace.


Matrices

An m × n matrix is often defined as a rectangular array of numbers with m rows and n columns. Extreme danger lurks here for computer scientists. The problem is that a matrix is a mathematical entity, while an array is a computer data structure that can be used to hold the matrix. So get it straight now: a matrix can be stored in any data structure (linked list, tree, two queues, etc). Except in some digital signal processing applications, matrices are always indexed from 1 by scientists, mathematicians, and other creatures that roamed the earth centuries before computer science was invented. However, a 5 × 3 matrix that is indexed starting from 1 can be stored as a 2D array with rows indexed from 0 to 4 and columns indexed from 0 to 2. Or it can have rows indexed from -17 to -13 (at least in Fortran or Blitz++), or be stored in a 1D array indexed from 0 to 14, or stored in a graph data structure, .... etc.


Structured Matrices

The term "structure" refers to many things in linear algebra, but the most common use in numerical computing involves the nonzero structure of the matrix. The reason we concentrate on the "nonzero pattern" is because many of the matrices that occur in solving partial differential equations (PDEs) and ordinary differential equations (ODEs) have all but a few of their entries equal to zero. Terms you should know include Other terms that are commonly used include A symmetric matrix A is one where Aij = Aji or equivalently, A = AT.

Although it sounds petty, being able to index from any given starting integer is extremely helpful in coding some numerical linear algebra operations. For example, a "banded matrix" is typically stored by diagonals (instead of by rows or columns). By convention, the main diagonal is numbered as the 0-th diagonal, the first superdiagonal is numbered as diagonal 1, etc. The first subdiagonal is numbered -1, the diagonal below that is numbered -2, etc. It is much easier to read and debug a code using banded matrices if the arrays storing the matrix values can be indexed from -n to n.


Submatrices

Given a matrix A and a set of row/column indices {i1, i2, ..., ik}, the corresponding principal submatrix of A is the k × k array consisting of elements with row and column numbers in that index set. A leading principal submatrix has i1 = 1, i2 = 2, ..., ik = k, for some k. A trailing principal submatrix is similarly defined, for the last k indices. Submatrices are also obtained by partitioning a matrix. Often it is useful to consider an m × n matrix A as a row vector of n column vectors, each from ℜm:
A = [c1 c2 ... cn]

and other times it is convenient to consider A as a column vector of m row vectors, each from ℜn:

A = [ r1T

         r2T
         .
         .
         .
         rmT]

In general the vectors ri from a row partitioning is different from the ci given by a column partitioning. When will they be the same?

More generally a matrix can be partitioned into blocks, where each block is itself a matrix.


Matrix Operations

Adding matrices, multiplying them by a scalar, and similar operations are easily defined in linear algebra. Make sure that you are familiar with multiplying matrices: in the operation C = A*B, the number of columns of A must equal the number of rows of B. If A is m × n and B is n × k, then C is m × k and its (r,c) element is the sum from i = 1 to i = k of Ari*Bic. That is, the rth row of A is multiplied times corresponding elements in the cth column of B. This holds even when Ari, Bic, etc. are subblocks of A and B rather than just real numbers. One special case is y = Ax, where A is m × n and y is m × 1, x is n × 1. By partitioning A by columns, the equation y = A*x expresses y as a linear combination of the columns of A; the ithcolumn as coefficient xi, the ith component of the vector x. Transpose: if A is an m × n matrix, then AT is the n × m matrix with element (i,j) the same as the (j,i) element of A. Recall that
(AB)T = BT AT

More generally, taking one or both of m and n to 1 gives four special cases of matrix multiplication that account for a surprisingly large proportion of scientific computations.

  1. In C = A*B where A is 1xn and B is n × 1, then C is a scalar and the operation is called a dotproduct, inner product, or the scalar product of the vectors A and B.
  2. In C = C + A*B, if A is n × 1 and B is 1 × 1, then C is an n-vector and the operation is called a saxpy or daxpy operation. It is most commonly formulated as y = y + alpha*x, where x and y are n-vectors and alpha is a scalar.
  3. In C = C + A*B, if A is m × 1 and B is 1 × n, the operation is called a rank-one update of C. The reason is because the rank of the matrix being added to C is equal to one. The product A*B in this case is called the outer product of A and B. Any matrix with rank 1 can be written as the outer product of two vectors. Rank-one updates will be common in solving linear systems numerically.
  4. If if A is m × n and B is n × 1, the operation is called a matrix-vector product.


More Subspaces

The range of an m × n matrix A is the set of all possible linear combinations of the columns of A. Another way to express this is
{y: y = Ax for some vector x}, where braces indicate a set. The range of A, range(A), is a subspace (reality check: it is a subspace of which space, ℜm or ℜn?) The null space of A is the set of all vectors that A maps to the zero vector: {x: Ax = 0}. Stated otherwise, a vector is in the null space of A if its components give a linear combination of the columns of A that sums up to zero. Note that range(A) and null(A) live in different worlds; one is in ℜm, the other in ℜn. Also note that if A is square and of order n, then the dimensions of range(A) and null(A) sum to n.


Inverses

The most prototypical problem in numerical computing is to "solve" the equation Ax = b for x, given coefficient matrix A and right-hand side vector b. When A is square n × n and has full column rank, then a unique inverse matrix A-1 exists such that A-1A = I = AA-1, where I is the identity matrix of order n. The solution can be written as x = A-1b. However, in practice you never want to compute or form the inverse matrix A-1 in solving a linear system. Doing so requires more computation and has lower numerical accuracy than Gaussian elimination does. A square matrix is nonsingular if any of the following hold: If matrices G and H are both invertible and the same size, then (AB)-1 = B-1 A-1

When A is not square, we typically want a "least squares" solution to Ax = b. That is, find the vector x that minimizes the 2-norm of the vector Ax - b. This is how you experimentally find the (n½, rinf) parameters for the Hockney-Jessup model of a pipelined operation.


Matrix Norms

The norm of a vector gives its length, while the norm of a matrix gives the maximum (relative) amount by which it can stretch a vector:

||A||2 = max{||Ax||2/||x||2, over all x ≠ 0}

This may seem to be a circular definition, but it is not. It defines a matrix norm by using vector norms, which are already well-defined.. A can be rectangular m × n, so the norm ||Ax||2 in the numerator is for a vector of length m, while the norm ||x||2 in the denominator is for a vector of length n. Mutatis mutandi you can similarly define the 1-norm and ∞-norm of a matrix, but they have an equivalent but easier computation:
This is not a complete definition of matrix norms; with one exception we are going to use the three above, which are examples of operator norms. One norm that is not defined as the maximum of a ratio of vector norms is the Frobenius norm:
Like any norm, matrix norms must satisfy the properties
A property that does not hold for all matrix norms but does for the four listed above is

||AB|| ≤ ||A||*||B||.

[Note the difference between this and the triangle inequality, which deals with the sum (not the product) of matrices.] If the matrix norm does satisfy this additional property, it is called a consistent norm.

Matrix norms are most commonly used in numerical analysis, to measure how close a computed matrix is to the ideal (in the Platonic sense) matrix. For example, a common approach to solving a linear system involves factoring it into the product of a lower and upper triangular matrices: A = L*U. After performing LU factorization the L and U factors are likely to have some errors from finite precision arithmetic. In that case, ||A-L*U|| provides a measure of how far off the computed factorization is from the exact one. Here, A, L, and U refer to matrices. (Slight digression: in LU factorization the array A is usually overwritten with the combined L and U factors.) A less common use for matrix norms is to define information compression approximations to large operators. That is used in the context of text retrieval or web search engines.

For large matrices A, avoid using the 2-norm and instead use almost any other norm. For square matrices A, the two-norm is the largest absolute value of the eigenvalues of A. This is expensive (in time and computation) to compute. Use the matrix 1 or matrix ∞ norm, or the "Frobenius" norm which is the 2-norm of the vector of length m*n obtained by stacking all the elements of A into one vector.

Throughout the remainder of this course, all norms (matrix or vector) will be the 2-norm, unless explicitly stated otherwise. So the subscript 2 will be dropped hereafter.
However, when numerically computing a matrix norm, only the 1, infinity, or Frobenius norms will be used for the efficiency.


Orthogonality and Orthogonal Matrices

An orthonormal basis for a subspace is especially convenient to work with. Let {qi, i = 1, 2, ..., k} be an orthonormal basis for a subspace S. Then if x is a vector in S, its representation as a linear combination of the qis is easily computed as
x = (q1Tx)q1 + (q2Tx)q2 + ... + (qkTx)qk.

If you create the n × k matrix Q with columns consisting of those vectors qis , then the coefficients of the linear combination are the entries in the vector QTx. Finding the representation in a non-orthonormal basis is much more difficult computationally.

Because orthogonal matrices don't change the norm, the multiplication of matrices and vectors is highly stable numerically. In floating point arithmetic you can multiply a vector by millions of orthogonal matrices and have only mild loss of precision at the end. Multiplying by an orthogonal matrix is sometimes called "applying an orthogonal transformation".


Eigenvalues

Eigenvalues are only defined for square matrices. If there is a scalar λ and a nonzero vector x such that Ax = λ x, then λ is an eigenvalue of A and x is the corresponding eigenvector. Even if A has all real-valued entries, it may have complex-valued eigenvalues. Geometrically the relation Ax = λ x says that in the direction given by the vector x, the action of A may stretch, or shrink the vector by a factor of |λ|. A (square, of course) matrix of order n has n eigenvalues, but those may not be distinct values - some eigenvalues may be repeated with different eigenvectors. Numerically you will almost never encounter eigenvalues with this kind of multiplicity; floating point arithmetic invariably perturbs the array values representing the matrices and vectors, and so the eigenvalues will seem distinct.

Some properties of eigenvalues:

When A is not symmetric, the eigenvalues may be complex and the eigenvectors may not form a basis for ℜn. This suggests that maybe eigenvalues are not the right thing to look at for nonsymmetric matrices, and there is a better approach, the singular value decomposition, for factoring a nonsymmetric or even a rectangular matrix.


Next: LU Factorization