Timings on Computers


Introduction and the Unix time Command

Accurately timing things on computers is essential for performance analysis, identification of bottlenecks in programs, and experimental determination of computational efficiency. For large scale problems it is also important to track down You may think you know just where memory and I/O activity is going on, but the late 1980's Pablo project at UIUC showed that users are typically unaware of the actual effects I/O has on their programs. That has changed in the past 30 years, and now computational scientists are even more unaware of the effects that I/O imposes.

For timing, the Unix command time in your default shell gives data for an entire program (or more precisely, an entire Unix process). If the executable is called "a.out", then a typical result using a sh, csh, or tcsh shell would look like

   % time a.out
   2648.1u 330.0s 52:43 94% 0+12372k 347+21447io 347pf+0w
The breakdown of the output line is:
  1. 2648.1u: seconds of user time devoted to process. User time is the time the CPU spent executing your code (and not other processes that may have been swapped in and out during your code's execution).
  2. 330.0s: seconds of system time devoted to the process. System time is time spent in executing system calls. A major source of system time is I/O - a memory access that goes to or from a disk requires calling the operating system to perform the action, and this is charged to the system time. Another source is calling a timing routine within your code.
  3. 52:43: elapsed wall-clock time. This means 52 minutes, 43 seconds and is what would be recorded if you used a stopwatch.
  4. 94% : Percent of machine's CPU resources devoted to your process. This roughly means that during the 52:43 wall clock time, your code was being executed 94% of the time, and other code was running 6%. If this number is 50% or less, then possibly the timings are not reliable because your program was being swapped in and out of the CPU, to give time slices to other user's programs.
  5. 0+ : Average amount of shared memory in kbytes (for shared libraries).
  6. 12372k : Average amount of unshared memory in kbytes. This indicates the program used 12.4 Mbytes of memory.
  7. 347+ : Number of block input operations.
  8. 21447io : Number of block output operations. Output is buffered; the operating system waits until you have accumulated enough output to a file to make it worthwhile to actually execute a write to disk (which is relatively expensive in terms of time).
  9. 347pf+ : Number of page faults. This indicates how many times during execution the program tried to access data that was not available in main memory and instead resided on a disk. For this example, most of the page faults probably only occured on the initialization phase of this program. Note that the number matches the number of block input operations, making that conjecture even more likely.
  10. 0w : Number of times your job is moved out of the game for another job to run on the CPU. This can indicate a heavily loaded machine (for which the only cure is to buy a more modern one), or it can indicate that your program is making several "system calls". A system call occurs when your program has to request the operating system to execute some command, such as an I/O operation or to read/set a system variable like the clock. In this case, the OS regards it as a great opportunity to move your job out and another one in. Why is this shown as 0, but the code had the CPU for only 94% of the time? The answer involves ....
Also, be aware that just using "time" will generally use your shell's timer. The same info can be obtained using /usr/bin/time instead, and with the -v option it displays the fields in a more understandable format.

Problems With Using Unix time

First, not all of the fields may be implemented by a given computer. This is not a problem that the user can deal with; it is up to the vendor. Obviously this can lead to misleading results. In the time example above, were there really no swaps, or does the OS interrupt your job and swap it out so often (even on a lightly loaded machine) that the vendor finds it too embarrasing to let a user find it out easily? [BTW: that actually was the case for the example above].

Another weakness: while the time command may show you've gotten 99.9% utilization of the CPU during the execution, that is just the fraction of time spent on your process, not the fraction of utilization you are getting of the theoretical peak of the machine. A vector update operation can get 99.9% CPU utilization over 217 seconds ... and yet just be running at 5% of the possible peak rate.

More deadly, the numbers are for the complete process. You typically are interested in an algorithm's time and want to exclude time it takes to read data in, to output diagnostics and results, etc. Or you want those those sections to be timed, and need to isolate those individual segments. Even more typically, if a code is taking a long time, you want to know why and where it is taking a long time. That means finding where the time gets spent, in some case all the way down to which individual loops or function calls are the guilty parties. So we need to look at timers which can be invoked from within your program.


Next page: getrusage()