Practical Use of the BLAS

The basic rules so far are:

  1. Whenever possible, use a BLAS operation and call the optimized built-in library.
  2. Whenever possible, convert the algorithm to use a higher level BLAS function.
  3. Within a single BLAS level, prefer a kernel that has the lowest memory reference to flop ratio.
Four versions of matrix-matrix have been shown, five including directly calling the BLAS version dgemm(). Now consider how to find and use the BLAS library.

BLAS Nomenclature

How did I know to look for something as weirdly named as "dgemm()"? The BLAS have a (mostly) systematic naming convention.

How do you find out what BLAS are exist? Use the man pages, or just plain old Google. You can also use the BLAS Quick Reference Card (a Postscript file), although learning to read the card requires some practice.

Part of the naming obscurity comes from a limitation of Fortran compilers when the BLAS were first proposed. An identifier in Fortran could have 6-8 characters, so the only safely portable way to proceed was to limit names to 6 characters. The latest standards have increased that limit up to 64 characters, which makes a major difference in readability. A function named matrix_matrix_multiply() is readable and easy to figure out; the name dgemm() is not.

Slightly related: Fortran has a built-in function for matrix-matrix multiply, called matmul(), which handles both matrix*vector and matrix*matrix multiplication. The Fortran dotproduct function is called dot_product(). These are "intrinsic" functions, meaning they are part of the language standard, and are available for use without having to link in any libraries. A good BLAS library, however, will always be at least as fast and in some cases much faster than the language intrinsic version.


BLAS Calling Conventions

When using the BLAS, beware that

The matrix-matrix multiply results obtained by calling dgemm came from a C++ code calling the standard interfaces to the BLAS, so the exact same code works on all platforms (Sun, IBM, SGI, Intel) without changing any of the source. Not a big deal ... until you have to modify all 143 of the BLAS calls in a 128k line fluid dynamics code.


1 Fortran also could (and still can) pass data using "common blocks", a software engineering nightmare because they do not appear in any argument list. Since 1989 Fortran can also pass arguments by value, but that was long after the BLAS were cast in the software equivalent of concrete.
2 Common blocks in Fortran allow a user to align doubles on single word boundaries, and the compiler and run time system must honor that foolishness. Another reason common blocks are regarded as evil by computer scientists.


Next page: A brief linear algebra review, preparatory to deriving a fast linear system solver using the BLAS.