Basic Architecture Ideas for High Performance


Getting good performance from codes, or just understanding bad performance when it occurs, requires some understanding of basic computer architecture. Computer architecture can be a deep subject which goes into the details of machine organization and its interaction with hardware. Fortunately, there are only three general ideas required for scientific computing, which have held throughout the history of electronic computers:
  1. Data locality
  2. Pipelining
  3. Parallelism
Those concepts are the basis of all high performance architecture. Here is a rough outline of what we need:
  1. Generalities
  2. Memory systems
    1. CPU speed vs. memory access speed
    2. Memory banks
    3. Memory hierarchies
  3. Cache model and examples
  4. Pipelining
    1. Instruction pipelining
    2. Loop unrolling
    3. Flop and memory pipes
  5. Enhancing data locality, pipelining in codes.
Memory banks were not covered in class, but are mentioned here because of their importance for vector machines ... which may be making a comeback.

General Concepts

A computer's architecture is a high level framework for the components making up a computer system and their interconnection. Important features for scientific computing consist of the memory system, the bus structure, internal CPU design, and I/O systems. For parallel machines this is extended to include the interconnection network (topology) for the processors. You need to understand enough architecture to

Scientific computations involve large amounts of data. Consider climate modeling, a modern application given latitude, longitude, elevation, and time, we want to model temperature, pressure, humidity, and velocity of air as time goes on. We might also propogate chemical species since those have an effect on weather.

Discretize this by laying down a 1 km x 1 km mesh, with 11 mesh points in vertical direction. Gives ~ 5x109 mesh points. Output for a single time value can require ~ 4.5 x 1010 doubles. Note that the data can be arranged in large multidimensional array.

The primary bottleneck in scientific computing is in moving data, not operating on data. We will analyze this and see how it works by using some small kernel operations operating on vectors and arrays, e.g.,

These operations are more important than might appear at first view - they really account for a large fraction of machine cycles in computational science and engineering. The pseudo-code used for examples assumes:

The first architectural aspect to consider is effective use of a memory hierarchy.