Basics of Linear Algebra
This is not an exhaustive review, but instead is primarily a listing
of some terminology that we need to use. We are not concerned about general
vector spaces, so we can restrict attention to the space Rn, the
set of n-vectors with elements drawn from the reals. By default, x in Rn
is a column vector (an n × 1 matrix). If we need a row vector
it will be indicated by transposing a column vector: xT.
In Matlab a transpose is indicated by a single quote mark,
e.g. x'. What if x is a complex vector in Matlab? See what x' is then.
All of the vectors and matrices used in P573 will be real-valued unless
otherwise explicitly stated.
Vector Norms
There are several "norms" or measures of vectors.
The reason for specifying these as
vector norms is because there are also
matrix norms, defined later in this page.
For a vector v with components
v1), v2), ..., vn),
the three most common norms used are
- 2-norm: sqrt( sum from i = 1 to n of (vi * vi))
- 1-norm: the sum from i = 1 to n of | vi) |
- infinity norm: the maximum value over i = 1 to n, of | vi) |
In Matlab, those vector norms are given by functions
- norm(v) = norm(v, 2) = 2-norm of v
- norm(v, 1) = 1-norm of v
- norm(v, inf) = norm(v, 'inf') = infinity-norm of v
For large matrices A, avoid using the 2-norm and alway use any other norm.
For square matrices A, the two-norm is the largest absolute value of
the eigenvalues of A. This can be expensive (in time and computation) to compute.
Linear Dependence
Let x1, x2, ..., xm be vectors in Rn,
and let a1, a2, ..., am be real numbers.
Then the vector
y = a1 x1 + a2 x2 + ... + am
xm
is called a linear combination of the vectors; the scalars ai
are called the coefficients of the linear combination. If all the ai
are zero the linear combination is the trivial one. A set of vectors
x1, x2, ..., xm in Rnare linearly
independent if the only linear combination of them that adds up to zero,
is the trivial one. Otherwise they are called linearly dependent. If a set
of vectors x1, x2, ..., xmis linearly dependent,
there is a nontrivial (i.e., not all equal to 0) set of scalar coefficients
a1, a2, ..., am such that
a1x1 + a2 x2 + ... + am
xm= 0
In the above equation, the ai are scalars and the xi are vectors.
[This violates the nomenclature convention discussed in class, mainly because
most people don't have Greek letters supported by default in their browsers.]
At least one of the coefficients is nonzero. As an example, suppose that
a2 ≠ 0.
Then the linear dependent property can be written as
x2 = -( a1/a2 x1 + ... + am/a2
xm)
This says the vector x2 contributes no additional information as far as linear
combinations are concerned, because it can be represented using some linear
combination of the other vectors. Keep in mind this information theoretic
point of view; it often enables you to make sense of different linear
algebra properties and operations.
Subspaces
S is a subspace of Rn if x+y and a*x are in S whenever a
is a real scalar and x and y are vectors in S. A subspace can begenerated
by a set of vectors V, by defining S to be the set of all possible linear
combinations of vectors in V. A basis for a subspace S is a set B
of vectors in S such that B is linearly independent, and B generates S. For
a given subspace S, any two bases B1 and B2must have
the same number of vectors; that number is the dimensionof the subspace.
Typically we stack together the vectors that form a basis of a subspace,
putting them as columns of matrix. If the entire space is of dimension n,
and the subspace is of dimension k, that creates an n × k matrix B.
This gives a shorter way of writing the linear dependence relation
y = a1 x1 + a2 x2 + ... + am
xm
by using the matrix with the basis vectors as its column:
y = B*a
The vector a in y = B*a is defined by a = (a1 , a2 , ... , ak).
( Slight digression: sometimes I will denote multiplication by
juxtaposition, e.g. Ba. Other times, particularly when emphasis is needed on
the multiplication operation, I'll use B*x.)
This gives another good heuristic: a matrix * vector product is just a
succint way of expressing a linear combination.
Matrices
An m × n matrix is often defined as a rectangular array of numbers with m
rows and n columns. Extreme danger lurks here for computer scientists. The
problem is that a matrix is a mathematical entity, while
an array is a computer data structure that can be used to hold the
matrix. So get it straight now: a matrix can be stored in any data structure
(linked list, tree, two queues, etc).
Except in some digital signal processing applications,
matrices are always indexed from 1 by scientists,
mathematicians, and other creatures that roamed the earth centuries before
computer science was invented. However, a 5 × 3 matrix that is indexed
starting from 1 can be stored as a 2D array with rows indexed from 0 to 4 and columns
indexed from 0 to 2. Or it can have rows indexed from -17 to -13 (at least
in Fortran or Blitz++), or be stored in a 1D array indexed from 0 to 14, or
stored in a graph data structure, .... etc.
Structured Matrices
The term "structure" refers to many things in linear algebra, but the
most common use in numerical computing involves the nonzero structure of the
matrix. The reason we
concentrate on the "nonzero pattern" is because many of the matrices that
occur in solving partial differential equations (PDEs) and ordinary
differential equations (ODEs) have all but a few of their entries equal to
zero. Terms you should know include
- a diagonal matrix: Aij = 0 for i ≠ j
- lower triangular matrix: Aij = 0 for j > i
- upper triangular matrix: Aij = 0 for i > j
- strictly lower triangular matrix: Aij = 0 for j >= i
- strictly upper triangular matrix: Aij = 0 for i >= j
- upper Hessenberg matrix: Aij = 0 for i > j+1
- tridiagonal matrix: Aij = 0 for |i - j| > 1
Other terms that are commonly used include
- the superdiagonal part of a matrix: the entries strictly above the
main diagonal
- the superdiagonal of a matrix: the first diagonal above the main diagonal
of a matrix
- the subdiagonal part of a matrix: the entries strictly below the main
diagonal
- the subdiagonal of a matrix: the first diagonal below the main diagonal
of a matrix
A symmetric matrix A is one where Aij = Aji or equivalently,
A = AT.
Although it sounds petty, being able to index from any given starting
integer is extremely helpful in coding some numerical linear algebra
operations. For example, a "banded matrix" is typically stored by
diagonals (instead of by rows or columns). By convention, the main
diagonal is numbered as the 0-th diagonal, the first superdiagonal is
numbered as the 1-th diagonal, etc. The first subdiagonal is numbered
-1, the diagonal below that is numbered -2, etc.
It is much easier to read and debug a code using banded matrices
if the arrays storing the matrix values can be indexed from -n to n.
Submatrices
Given a matrix A and a set of row/col indices {i1, i2,
..., ik}, the corresponding principal submatrix of A is
the k × k array consisting of elements with row and column numbers in that
index set. A leading principal submatrix has i1 = 1, i2
= 2, ..., ik = k, for some k. A trailing principal submatrix
is similarly defined, for the last k possible indices. Submatrices are also
obtained by partitioning a matrix. Often it is useful to consider an m
× n
matrix A as a row vector of n column vectors, each from Rm:
A = [a1 a2 ... an]
and other times it is convenient to consider A as a column vector of m row
vectors, each from Rn:
A = [ a1T
a2T
.
.
.
anT]
Note that for a given matrix A, the ai from a row partitioning
is different from that given by a column partitioning (when will they be the
same?). Also, more generally
a matrix can be partitioned into blocks, where each block is itself a matrix.
Matrix Operations
Adding matrices, multiplying them by a scalar, and similar operations are
easily defined in linear algebra. Make sure that you are familiar with
multiplying matrices: in the operation C = A*B, the number of columns of A
must equal the number of rows of B. If A is m × n and B is n × k, then C
is m × k and its (r,c) element is the sum from i = 1 to i = k of Ari*Bic.
That is, the rth row of A is multiplied times corresponding elements
in the cth column of B. This holds even when Ari, Bic,
etc. are subblocks of A and B rather than just real numbers. Note the special
case of y = Ax, where A is m × n and y is m × 1, x is n × 1. Then by looking
at A partitioned by columns, the equation y = A*x expresses y as a linear
combination of the columns of A; the ithcolumn as coefficient
xi, the ith component of the vector x. Transpose:
if A is an m × n matrix, then AT is the n × m matrix with element
(i,j) the same as the (j,i) element of A. Recall that
(AB)T = BT AT
There are four special cases of matrix multiplication that we'll spend
a lot of time with.
- In C = A*B where A is 1xn and B is n × 1, then C is a scalar and
the operation is called a dotproduct, inner product, or the scalar
product of the vectors A and B.
- In C = C + A*B, if A is n × 1 and B is 1 × 1, then C is an n-vector
and the operation is called a saxpy or daxpy operation.
It is most commonly formulated as y = y + alpha*x, where x
and y are n-vectors and alpha is a scalar.
- In C = C + A*B, if A is m × 1 and B is 1 × n, the operation is called
a rank-one update of C. The reason is because the rank of
the matrix being added to C is equal to one. The product A*B in
this case is called the outer product of A and B.
Any matrix with rank 1 can be written as the outer product of
two vectors. Rank-one updates will be common in solving linear systems
numerically.
- If if A is m × n and B is n × 1, the operation is called a
matrix-vector product.
More Subspaces
The range of an m × n matrix A is the set of all possible linear combinations
of the columns of A. Another way to express this is
{y: y = Ax for some vector x}, where braces indicate a set.
The range of A, range(A), is a subspace (reality
check: it is a subspace of which space, Rm or Rn?) The
null space of A is the set of all vectors that A maps to the zero
vector: {x: Ax = 0}. Stated otherwise, a vector is in the null space of A
if its components give a linear combination of the columns of A that sums
up to zero. Note that range(A) and null(A) live in different worlds; one
is in Rm, the other in Rn. Also note that if A is square
and of order n, then the dimensions of range(A) and null(A) sum to n.
Inverses
The most prototypical problem in numerical computing is to "solve" the equation
Ax = b for x, given coefficient matrix A and right-hand side vector
b. When A is square n × n and has full column rank, then a unique
inverse matrix A-1 exists such that A-1A = I = AA-1,
where I is the identity matrix of order n. The solution can be written as
x = A-1b. However, in practice you never want to compute or form
the inverse matrix A-1 in solving a linear system. Doing so requires
more computation and has lower numerical accuracy than Gaussian elimination
does. A square matrix is nonsingular if any of the following hold:
- An inverse matrix A-1 exists.
- A has full column rank (columns are a linearly independent set of vectors)
- A has full row rank (rows are a linearly independent set of vectors)
- The only solution to Ax = 0 is x = 0.
- null(A) = {0}, the zero vector of order n.
- dim(range(A)) = n
When A is not square, we typically want a "least squares" solution
to Ax = b. That is, find the vector x that minimizes the 2-norm of the vector Ax
- b. This is how you experimentally find the
(n½, rinf) parameters for the Hockney-Jessup
model of a pipelined operation.
Matrix Norms
The norm of a vector gives its length, while the norm of a matrix gives the
maximum (relative) amount by which it can stretch a vector:
||A||2 = max{||Ax||2/||x||2, over all nonzero vectors x}.
This may seem to be a circular definition, but it is not. It defines a
matrix norm by using vector norms, which are already
well-defined..
A can be m × n, so the norm ||A||2 in the numerator
is for a vector of length m, while the norm ||x||2
in the denominator is for a vector
of length n. Mutatis mutandi you can similarly define the 1-norm
and infinity norm of a matrix, but they have an equivalent but easier computation:
- ||A||1 = max{||aj||1, j = 1, 2, ...
n}, where aj is the j-th column of A
- ||A||inf = max{||ai||1, i = 1, 2, ... m}, where ai
is the i-th row of A
This is not a complete definition of matrix norms; with one exception we are
going to use the three above, which are examples of
operator norms. One norm that is not defined
as the maximum of a ratio of vector norms is the Frobenius norm:
- ||A||F = sqrt [ sum(A(i,j)2, over i = 1, 2, ...
m and j = 1, 2, ... n} ]
Like any norm, matrix norms must satisfy the properties
- Positivity: ||A|| >= 0, with equality only when A = 0
- Linearity: ||sA|| = |x|*||A||, for any scalar s
- Triangle inequality: ||A + B|| <= ||A|| + ||B|| for any conformant A and B
A property that does not hold for all norms but does for the four matrix
norms listed
above is
||AB|| <= ||A||*||B||.
[Note the difference between
this and the triangle inequality, which deals with the sum (not the product)
of matrices.] If the matrix norm does satisfy this additional property, it
is called a
consistent norm.
Matrix norms are most commonly used in numerical analysis,
to measure how close a computed matrix is to the ideal (in the Platonic
sense) matrix. For example, a common approach to solving a linear system
involves factoring it into the product of a lower and upper triangular
matrices: A = L*U.
After performing LU factorization the L and U factors are likely to have
some errors from finite precision arithmetic.
In that case, ||A-L*U|| would provide a measure of how far off the
computed factorization is from the exact one. Here, A, L,
and U refer to matrices. (Slight digression: in LU factorization
the array A is usually overwritten with the combined L and U factors.) A
less common use for matrix norms is to define information compression approximations
to large operators. That is used in the context of text retrieval or
web search engines.
Throughout the remainder of this course, all norms (matrix or vector)
will be the 2-norm, unless explicitly stated otherwise. So
the subscript 2 will be dropped hereafter.
Orthogonality and Orthogonal Matrices
- Two vectors x and y are orthogonal if their inner product is
0.
- The Cauchy relation is xTy = ||x||*||y||*cos(theta),
where theta is the angle between x and y. So two vectors are orthogonal
if at least one of them is the zero vector, or if the angle between them
is 90 degrees. This also implies that xTx = ||x||2,
since the angle a vector makes with itself is 0 degrees.
- A set of vectors is orthogonal if they are pairwise mutually
orthogonal.
- A set of vectors is orthonormal if it is orthogonal, and
each vector in it has norm 1 ( ||xi|| = 1, for all i ).
- An m × n matrix has orthonormal columns if the set of
vectors formed by its columns is an orthogonal set.
- An m × n matrix Q with orthonormal columns satisfies QTQ
= I.
- A matrix Q is orthogonal if it is square and has orthonormal columns.
This also implies it has orthonormal rows. This in turn means
that the inverse of an orthogonal matrix is its transpose (computationally,
that makes it darned easier to solve a linear system with an orthogonal matrix
as a coefficient matrix).
- If Q is orthogonal, then ||QA|| = ||A|| for any conformant matrix A,
including the case where A is n × 1 (a vector). This equality holds for both
the 2-norm and the Frobenius-norm, but not the one-norm or infinity-norm.
- If Q is orthogonal, ||Q|| = 1.
An orthonormal basis for a subspace is especially convenient to work with.
Let {qi, i = 1, 2, ..., k} be an orthonormal basis for a
subspace S. The if x is a vector in S, its representation as a linear
combination of the qis is easily computed as
x = (q1Tx)q1
+ (q2Tx)q2 + ... + (qkTx)qk.
If you create the n × k matrix Q with columns consisting of
those vectors qis , then the coefficients of the linear combination
are the entries in the vector QTx. Finding the representation
in a non-orthonormal basis is much more difficult computationally.
Because orthogonal matrices don't change the (2- or Frobenius-) norm, they
are highly stable numerically when multiplying by them.
In floating point arithmetic you can multiply a vector by millions of orthogonal
matrices and have only
mild loss of precision at the end.
Multiplying by an orthogonal matrix is
sometimes called "applying an orthogonal transformation".
Eigenvalues
Eigenvalues are only defined for square matrices. If there is a scalar
s and nonzero vector x such that Ax = sx, then s is an eigenvalue
of A and x is the corresponding eigenvector. Even if A has all
real-valued entries, it may have complex-valued eigenvalues. Geometrically
the relation Ax = sx says that in the direction given by the vector x, the
action of A may stretch or reverse the vector by a factor of |s|, but it
will not change the vector's direction
otherwise. A (square, of course) matrix of order n has n eigenvalues,
but those may not be distinct values - some eigenvalues may be repeated
with different eigenvectors.
Numerically you will almost never encounter eigenvalues with this
kind of "multiplicity"; floating point arithmetic invariably perturbs the
array values representing the matrices and vectors, and so the eigenvalues
will seem distinct.
- A symmetric matrix has all real eigenvalues (no complex-valued ones).
This is easy to prove - try it.
- Define X = [x1 x2 ... xn],
the matrix whose columns are the eigenvectors of A. Let S be the diagonal
matrix with the corresponding eigenvalues along its main diagonal. Then
AX = XS from the definition of eigenvectors. If X is nonsingular,
then A = XSX-1
- If A is symmetric, it has a full set of n eigenvectors which can be chosen
to be orthonormal. In that case, A = QSQT, where Q is orthogonal.
This is called the Schur decomposition of A. Because
orthogonal transformations are so friendly to numerical computing,
almost always a symmetric matrice's eigenvectors are chose to be
orthogonal this way.
- When A is symmetric, you can represent any n-vector x as a linear combination
of the (orthonormal) eigenvectors of A. This is called the eigendecomposition
of x (with respect to A).
When A is not symmetric, the eigenvalues may be complex and the eigenvectors
may not form a basis for Rn. This suggests that maybe eigenvalues are
not the right thing to look at for nonsymmetric matrices, and there is a
better approach, the singular value decomposition, for decomposing a
nonsymmetric or even a rectangular matrix.
Next page: LU Factorization
- Started: 17 Aug 1995
- Updated: Fri Jan 30 10:26:42 EST 2004
- Updated: Sun 09 Sep 2007 to start moving to html math symbols
- Last Modified: Thu 29 Oct 2009, 12:24 PM