Common Unix Timers

There are a several Unix timers: etime, dclock, mclock, gettimeofday, gethrtime, etc. For portability it is best to write a function elapsedtime which is called within your codes. That way only a single function needs to be changed when moving the code to another machine.

Four major issues occur when using a timer:

What is clock resolution, that is, how small of a time interval can be measured?
What is overhead of calling the timer, that is, how much does it add on to the program execution time?
How much does it perturb the results? A timer that performs a system call may flush the cache, for example, and then extra cost is entailed by the computer having to reload data from main memory into the cache. [The cache is a small fast memory that contains frequently used data, to cut down on memory access times. Details about caches will be later in the course.]
What does the timer return?
- An elapsed time clock gives the time since some event such as the start of the process, or midnight on 31 December 1970.
- A delta time clock gives the time elapsed since the last call to the timer.
- A wall clock timer gives just what it sounds like: the actual time elapsed since the start of your job. Unfortunately that will include time the CPU spent handling other jobs while yours was swapped out.

For the distinction between an elapsed time and a delta clock, the first is used as

   time_1 = elapsedtime()
       (Stuff to be timed goes here)
   time_for_Stuff = elapsedtime() - time_1

while a delta timer is used as

   time_1 = elapsedtime()
       (Stuff to be timed goes here)
   time_for_Stuff = elapsedtime()

Usually you want to find elapsed CPU time instead of wall clock time . This is particularly true for large scale scientific computing, where your job may spend a large amount of wall clock time swapped out. Also, if available an elapsed time clock is preferred to a delta time clock. You can always implement one, given the other (and it is a useful exercise to figure out how!), but the common approach to timing parts of a program is to time a few sections that you suspect account for most of the time, and then to subtract the sum of those from the overall time to find time spent in "everything else".

The terms "timing block" or "timing interval" refer to the chunk of code between two calls to a timer function, the part of the code that is specified as "(Stuff to be timed goes here)" above. That chunk may be a loop or a invocation of a function. Clock resolutions typically range from around a few nanoseconds, which is considered ``high resolution'', up to 0.01 seconds, which is considered pretty sloppy. The overhead (function call penalty) for calling a timer on modern (≥ 2012 CE) machines is now almost negligble, but that changes from year to year, so measuring it is the only reliable method that is guaranteed to work in the future.

To avoid problems with resolution and overhead, follow the general rules whenever possible:

Make sure that the interval between calls to the timer is ≥ 100*(resolution + overhead). That typically means increasing the number of operations inside the timing segment to where the measured time interval satisfies the inequality.
Make multiple timing runs and take the average, but look for unusual maxima or minima. Plotting 10 million data points in Matlab takes less than one second on a 3.2 Ghertz Intel Core I7 machine with a Nvidia GeForce GTX 295 graphics card (in 2019), so looking at all of the data points instead of an average is fast and easy - and it can save you some embarrassment (other times for plotting are given below).
Since timing routines vary from machine to machine, write your own wrapper timer which simply invokes the native timer. Although that adds an extra layer in the function invocation (and so some extra overhead), it allows you to move your code to another machine with another timer by simply changing the timer name in one place.

Why not use standard statistical techniques for judging the quality of a timing? The brief answer is that you should, but it does require rather sophisticated statistical methods and cannot be done blindly. Timings rarely follow a normal (Gaussian) curve - why? Furthermore, they frequently cluster around discrete quanta corresponding to system events happening or not happening (like swaps). The real question is why not look at all of the timing data? Plotting 100k data points in Matlab takes < 0.25 seconds on a five year old workstation from 2006, and plotting 1M data points takes about 0.29 seconds. It's dumb to not look at the data for outliers and strange values when it can be done faster than you can read this sentence. You can right-click and download the Matlab script used to find those timings, and try some timings yourself to see how much data your system can realistically handle.

C/C++ have several timers, but the resolution is sometimes only claimed (by header file or man pages) to be 0.01 seconds, the Posix standard. In practice, many C/C++ systems have much better resolution than that; you have to measure it to find out the actual value. The Fortran language standard requires vendors to provide a subroutine (function that returns void, for C/C++ folks) that return the clock's resolution and other information. This is OS-independent, making it practical for using across platforms as well. The routine that provides wall clock time is subroutine system_clock(count, count_rate, count_max) integer:: count, count_rate, count_max where

count = current value of system clock
count_rate = number of clock ticks per second
count_max = maximum value of count

Although the interface is platform independent, the actual values that system_clock() returned depend on the compiler, compiler options, and even what type of integers are used. The Intel ifort compiler on most Intel chips returns count_rate = 10000 = 10k. But if the variables are kdeclared as


    integer(kind=selected_int_kind(18)) ::  count, count_rate, count_max

(which makes them 8-byte integers), count_rate = 1000000 = 1M. The gfortran compiler on the same platform returns count_rate = 1000. count_max is the maximum value of count; this is typically 2147483647 or 9223372036854775807, numbers that should be immediately recognizable. As a hint add one to each and then take its log base 2.

The count_max is important because of ...
Negative times. Scientific codes often run for a long time, possibly days or even weeks. Even on a smaller scale it often occurs that the clock rolls over, and the difference between end time and start time is a negative number. When that happens, if the rollover occured only once, the correct time is given by

    call system_clock(count_start, count_rate, count_max)
    ...
    call system_clock(count_end, count_rate, count_max)
    time_used = count_end - count_start
    if (time_used ≤ 0) time_used = time_used + count_max

If the code fragment takes a really long amount of time, then the clock may have rolled over multiple times. In that case insert timer calls into the timed section, and try to keep track of how many roll-overs occurred. My recommendation is in C to use the Unix epoch time (which won't roll over until 2038), and in Fortran to use date_and_time(). And arrange for 2038 to be a vacation year for yourself. Another option is to use 64-bit integers for counting seconds. Those won't roll over until 4 December 292277026596 CE, and it's a safe bet none of us will be around then.

Last Modified: Fri 27 Sep 2019, 02:41 PM