Errors in Floating Point Arithmetic


Different Algorithms have Different Numerical Properties

The detailed study of numerical error propogation in algorithms is the provenance of numerical analysis, and is not the central concern in this course (instead, it is the implementation of algorithms and mapping their computations to computer hardware efficiently). Still, it is useful to see how errors propogate in even straightforward evaluations. Consider evaluating the polynomial (x-1)6, which can be expanded to

p(x) = (x-1)6 = x6 -6x5 +15x4 -20x3 +15x2 -6x + 1

To save typing in those coefficients, Matlab has a function poly(v), which takes a vector of roots of the polynomial and returns the coefficients in terms of powers of x:

>> poly(ones(6,1))
ans =
     1    -6    15   -20    15    -6     1
So here is a code to look at the plot in increasingly small subintervals around x = 1. Note that the polynomial has a single root, which is 6-fold at x = 1.
%  -------------------------------------------
%  Generate the coefficients of the polynomial
%           p(x) = (x-1)^6
%  -------------------------------------------
   coeffs = poly(ones(6,1));
   k = 0;
%  ---------------------------------------------
%  Look at the region around 1, at increasingly
%  smaller subintervals.
%  ---------------------------------------------
   for delta = [.1 .01 .008 .007 .005  .003 ]
     x = linspace(1-delta,1+delta,100)';
%    ---------------------------------------------------
%    Evaluate the polynomial defined by the coefficients
%    ---------------------------------------------------
     y = polyval(coeffs,x);
     k = k+1;
     subplot(2,3,k)
     plot(x,y,x,zeros(1,100))
     title(['d=',num2str(delta)])
     axis([1-delta 1+delta -max(abs(y)) max(abs(y))])
   end
The graph it produces is:

[zoom1.gif]

Plot of (x-1)6 = x6 -6x5 +15x4 -20x3 +15x2 -6x + 1

At closer and closer views, it seems that the polynomial has dozens of roots near 1. Now consider the view when the polynomial is computed with the Matlab expression (x-1).^6

[zoom2.gif]

Plot of (x-1)6

This shows that there is no problem with the second formulation ... or at least not down to the resolutions we consider. The algorithm chosen has made a major difference in the computational accuracy. By the way, where did the errors come from? The computation of the coefficients is exact, so it has to be in either computing the powers of x, or in the summation of the terms in the polynomial. Or both. Bum numbers can occur all over the place. Following are some examples, first demonstrated by Kulisch and Mirankar:

Dot Products

Let the vectors a and b be:
a = [2.718281828;
    -3.141592654;
     1.414213562;
     0.5772156649;
     0.3010299957];

b = [1486.2497;
   878366.9879;
      -22.37492;
  4773714.647;
        0.000185049];
The correct value is -1.00657107E-11. Matlab (and C and Fortran) give 1.025188136829667e-10. Wrong magnitude, wrong sign. However, that is if you do the products first and then sum the numbers. If you write the code using the usual dotproduct loop, the result at least has the right sign - try it.

Yup, some of these examples are cooked up artificial cases. But having a dot product that sums up to near zero is real common in scientific computing. E.g., the two-norm of an error vector in an iterative method, or in optimization the angle between a function's gradient vector and the tangent plane of the constraint functions. The other examples may crop up less often but are deadly when they do. You need to be able to know when the problem is in your code, and when it is in the mathematical model the application scientist is using, or the algorithm chosen. [Shifting blame to someone else is one of the big payoffs from knowing what is causing computations to go haywire.]

Expression Evaluation

Let x = 192119201 and y = 35675640; when you evaluate

E = (1682*x*y4 + 3*x3 + 29*x*y2 -2*x5 + 832)/107751

double precision floating point arithmetic gives 7.180560037061026e+20. The correct value is 1783.

Linear System Solving

Solve the system Ax = b, where
b = [2; 0]

A = [64919121  , -159018721
     41869520.5, -102558961]
The true solution is x = [205117922; 83739041], while the computed solution is x = [1.000000009499609e+00; 4.082482943420623e-01];

Just what condition your condition is in

A critical distinction to make in numerical analysis is between conditioning and stability. A badly conditioned problem is one that cannot be solved accurately by any numerical method (except by dumb blind luck) - it is too "close" in some sense to a similar problem of its class which cannot be solved. So for the linear system above, if we can just slightly perturb its coefficients and get a singular matrix, then the problem is ill-conditioned (actually, the definition of the system's "condition number" would necessarily involve the values in b as well, but using that of A alone is good enough). See which of those problems caused the bad solution for Ax = b above, by checking the condition number of the matrix in Matlab. Don't know how to do that? Gee, if there was just some way in Matlab to find where a particular function is, sort of a place to lookfor help, maybe.

A stable algorithm will return an accurate solution to a well-conditioned problem. It will produce pretty much the same garbage as anything else on an ill-conditioned problem. Stability is a feature of algorithms and their implementation, which we can control. Conditioning is a feature of the problem itself, and we cannot control that - except by going back to the application and reformulating the mathematical model to generate more favorable subproblems.

To find what causes errors, a more detailed view needs to be taken of the computer representation. of real numbers.


Next: Floating Point Number Representation

Go back home