Timings on Computers
Introduction and the Unix time Command
Accurately timing things on computers is essential for performance analysis,
identification of bottlenecks in programs, and experimental determination
of computational efficiency. For large scale problems it is also important
to track down
-
memory activity such as page faults, which occurs when the program
accesses data that is not in the computer's main memory. In this case,
the data needs to be read in from disk (or tape on
HPSS),
and can require 10-1000 times as long to access.
-
I/O activity such as periodic output of the state of the computation. Although
not always avoidable, sometimes this can be made more efficient by writing
output files in binary format, or by spawning a separate thread to handle
the output.
You may think you know just where memory and I/O activity is going on,
but the late 1980's Pablo project at UIUC showed that users are typically unaware
of the actual effects I/O
has on their programs. That has changed in the past 30 years, and now
computational scientists are even more unaware of the effects that I/O
imposes.
For timing, the Unix command time gives data for an entire program
(or more precisely, an entire Unix process). If the executable is called
``a.out'', then a typical result would look like
% time a.out
2648.1u 330.0s 52:43 94% 0+12372k 347+21447io 347pf+0w
The breakdown of the output line is:
-
2648.1u: seconds of user time devoted to process. User time is the time
the CPU spent executing your code (and not other processes that may have
been swapped in and out during your code's execution).
-
330.0s: seconds of system time devoted to the process. System time is time
spent in executing system calls. A major source of system time is I/O
- a memory access that goes to or from a disk requires calling the operating
system to perform the action, and this is charged to the system time. Another
source is calling a timing routine within your code.
-
52:43: elapsed wall-clock time. This means 52 minutes, 43 seconds and is
what would be recorded if you used a stopwatch.
-
94% : Percent of machine's CPU resources devoted to your process. This
roughly means that during the 52:43 wall clock time, your code was being
executed 94% of the time, and other code was running 6%. If this number
is 50% or less, then possibly the timings are not reliable because your
program was being swapped in and out of the CPU, to give time slices to
other user's programs.
-
0+ : Average amount of shared memory in kbytes (for shared libraries).
-
12372k : Average amount of unshared memory in kbytes. This indicates the
program used 12.4 Mbytes of memory.
-
347+ : Number of block input operations.
-
21447io : Number of block output operations. Output is buffered; the operating
system waits until you have accumulated enough output to a file to make
it worthwhile to actually execute a write to disk (which is relatively
expensive in terms of time).
-
347pf+ : Number of page faults. This indicates how many times during execution
the program tried to access data that was not available in main memory
and instead resided on a disk. For this example, most of the page faults
probably only occured on the initialization
phase of this program. Note that the number matches the number of block
input operations, making that conjecture even more likely.
-
0w : Number of times your job is moved out of the game for another job
to run on the CPU. This can indicate a heavily loaded machine (for
which the only cure is to buy a more modern one), or it can indicate that
your program is making several "system calls". A system call occurs when
your program has to request the operating system to execute some command,
such as an I/O operation or to read/set a system variable like the clock.
In this case, the OS regards it as a great opportunity to move your job
out and another one in. Why is this shown as 0, but the code had the
CPU for only 94% of the time? The answer involves ....
Problems with the Unix time command
First, not all of the fields may be implemented by a given computer.
This is not a problem that the user can deal with; it is up to the vendor.
Obviously this can lead to misleading results. In the example results shown
above,
were there really no swaps, or does the OS interrupt your job and
swap it out so often (even on a lightly loaded machine) that the vendor
finds it too embarrasing to let a user find it out easily? [BTW: that actually
was the case for the example above].
Another weakness: while the time command may show a code has achieved
99.9% utilization of
the CPU during the execution, that is just the fraction of time spent
on your process, not the fraction of utilization you are getting of the
theoretical peak of the machine. A vector update operation can get
99.9% CPU utilization over 217 seconds ... and yet just be running at
5% of the possible peak rate.
More deadly, the numbers are for the complete process. More typically
an algorithm's time is being examined, so the time it takes
to read data in, to output diagnostics and results, etc. should be excluded.
Or those sections also need to be isolated into individual segments and
timed. Even more typically, if a code is taking
a long time, you want to know
why and where it is taking a long time. That
means finding where the time gets spent, in some case all the way down
to which individual loops or function calls are the guilty parties.
So timers which can be invoked from within a program are needed.
Next page:
Finding resource usage
- Started change tracking: Mon 14 Sep 2009, 01:58 PM
- Modified: Thu 25 Aug 2011, 02:41 PM updated times
- Modified: Mon 12 Sep 2011, 11:41 AM stopped stuttering
- Last Modified: Mon 12 Sep 2011, 11:42 AM