Finding Timer Resolution and Overhead

Measuring resolution and overhead for your timer is important - it lets you know how many times to repeat an operation to get reliable timings. Increasingly, the overhead of calling the timer is negligible, but resolution is far from it and increasingly fast processors a mean significant amount of computation can be performed in a single clock tic. This is not the hardware system clock that sends out the drumbeat at whatever Ghertz rating your box has. Instead it is the clock that you can actually call from a C/C++/Fortran code.

Measuring the Resolution of a Timer

To find timer resolution, use the computer equivalent of the Milliken Oil Drop experiment. Time an operation that takes some small amount of time, and repeat it several times. If that operation is close to the resolution, some timings will show zero while others will show up as taking a small amount of time, which is an integer multiple of the clock resolution. Actually, the difference between any two timings is an integer multiple of the clock resolution, so even if the timer is high resolution and no zero times occur, all of the timings obtained will be multiples of the clock resolution. So take several measurements, plot them, and look for them to line up on horizontal bands.

An example algorithm using an elapsed time clock and integer addition is:


           initialize:
               nsamples = 3333333
               noperations = 7
        
           iterate:
           for k = 1, ... , nsamples
               time_start = clock_time()
               sum = 0
               for i = 1, ... ,  noperations
                     sum = sum + 1
               end for
               time_end = clock_time()
               time_required = time_end - time_start
               write out time_required
           end for

If roughly 50-90% of the times are nonzero, then the smallest positive time is probably the clock resolution. A good value for noperations is machine dependent. Start out with noperations = 100, but don't be suprised if a value like 100000 or larger is needed. A fast machine and low resolution clock will require many more operations to get the clock to tick over. On some modern systems the clock has high enough resolution that even setting noperations = 0 will not cause any zero timings to appear. If that is the case, function call overhead costs dominate and life is good. Even in this case the resolution can still be estimated using the measured timings.


Example 1: For timings of

[ 1.00    2.00    1.75     0.50    1.25   3.25 ]
seconds, the resolution is at least as small as 0.25 seconds because that is the smallest value that all of the timings are integer multiples of. Resolution could be smaller than 0.25, e.g., the timings shown are also integer multiples of 0.05. Without an additional timing measurement that gives something like 6.10 seconds, it can only be stated that 0.25 is a upper bound (and integer multiple of) the timer resolution.


Example 2: If the vector of measured timings is

[0  1.953125e-3  1.953125e-3   3.906250e-3  4.8828125e-3   0   0   0 ] 
the resolution is 9.765625e-4. If this number looks strange, crank up Matlab or some calculator and look at its inverse 1/9.765625e-4. These numbers came from an old HP PC and were not just concocted for pedagogy. How on earth was the number 9.765625e-4 extracted from those timings? In Matlab look at the consecutive differences in the sorted timing vector: d = diff(sort(timings)), and extract the smallest nonzero from it. Doing so is not guaranteed to give the clock resolution.


Example 3: Using Matlab's diff(sort()) on timings

[0.00    2.00    1.25     0.50    1.25   3.25 ]
shows that the smallest difference is 0.50, while the clock resolution is obviously 0.25 or less. However, other than a cooked-up examples like this, I've never encountered a machine where taking large number of measurements and using diff() and sort() have failed to yield the correct resolution. Sidenote: it is possible to extract the 0.25 from the set of data ... try to figure out how, and how to guarantee that it will work in general.


Example 4: Here are some timings taken on a 3.2 Ghertz Intel CoreI7. First, the raw timings using a blue + sign for each datum:

[rawtimes.png]

After sorting the timings into non-decreasing order:

[sortedtimes.png]

The consecutive differences between elements in the sorted timing vector:

[diffs.png]

At this point, you should be able to read off the clock resolution from the above graphs. Go no further until you figure it out and understand the plots. The remaining material is of little or no use to you otherwise.

Warning 1: do not trust manual pages or vendor's claims about clock resolution. They lie blatantly, often, and shamelessly, sometimes to cover up weaknesses, and sometimes just to avoid headaches. E.g., "Posix requires 1/100 second resolution, our clock has nanosecond resolution, so if we just leave the documentation saying 1/100, then we're safe and won't have obnoxious professors complaining that their measurements show a 2 nanosecond resolution." Even in the rare event that a vendor tells the truth, the resolution you can actually measure is a better guide and more useful than a theoretical number. There may be some unavoidable and erratic costs that occur sporadically, e.g., ones related to the OS accessing a hardware clock.

Warning 2: The examples have only a few timings shown, but in practice use a gazillion timings (a gazillion means "a lot of" or "gobs and gobs of"). Nowadays (circa 2019 C.E.) use a million or more, that is, nsamples = 1000000. Doing so on a 2.8 Ghertz Intel Core i7 system takes less than 40 seconds. Also in less than 40 seconds, Matlab can slurp in, analyze, and plot that much data.


Timer Overhead Measurement

How much does calling the timer itself perturb timing results? Calling a timer is a function call, which involves some work: pushing the state of the process onto a stack, loading registers, and reading data (the fields in memory where the timer data is kept) which may in turn involve a page fault or cache miss. Invoking a timer function usually requires a call to the operating system, and process schedulers take advantage of that to swap in and run any processes waiting for the CPU. So do not assume that the overhead of a timer function is the same as any other function. Timer overhead can be found by calling the timer many times in a single timing block, as required by the clock resolution found earlier.

Applying knowledge about timing methodologies