Introduction to Parallel Computing


Modern research applications continue to challenge computers with the size and amount of computation required. Those appls include weather modeling, climate models, fusion energy simulations, crash-worthy testing of proposed automobiles. All other aspects of computing must be matched to the computation rate: I/O, memory sizes, archival store/retrieve, visualization, and networks. The principals and techniques covered in this class will in many cases also apply to those parts of large scale applications as well. The most central one has held from the beginning of digital computing: calculations with data are not expensive or even the limiting factor; it is instead the movement of data. For serial (single processor) applications that refers to moving data between hard drives, main memory, caches, and the processors. Parallel computing adds another layer to this data hierarchy, since with few exceptions it requires moving data from one machine to another during a single application.


The traditional computer model is a "von Neuman machine": instructions are executed sequentially in a repeated cyle:

  1. Instruction fetch and decode
  2. Addresses of operands calculated
  3. Operands fetched from memory
  4. Operation performed using operands
  5. Results written back to memory
Hardware improvements continue to increase speed exponentially, but cutting edge large scale applications still require harnessing multiple systems operating simultaneously. Further improvements can come from parallelism, which takes several forms: Parallelism was introduced early on in machines, particularly for allowing memory accesses to be overlapped with instruction execution.

Effective use of parallelism requires an integrated CS approach, involving:

First some partitioning of the design space of parallel architectures is needed.


  • Next: Flynn's taxonomy