Finding Timer Resolution and Overhead

Measuring resolution and overhead for your timer is important - it lets you know how many times to repeat an operation in order to get reliable timings.

Timer Resolution Measurement

You can find timer resolution by the computer equivalent of the Milliken Oil Drop Experiment. Time an operation that takes some small amount of time, and do it several times. If that operation is close to the resolution, some timings will show zero while others will show up as taking a small amount of time, which is an integer multiple of the clock resolution. Even if the timer is high resolution and no zero times occur, all of the timings obtained will be multiples of the clock resolution. So take several measurements, plot them, and look for them to line up on bands. An example algorithm using an elapsed time clock and integer addition is:
   Initialize:
     nsamples = 20
     noperations = ???
   For k = 1, ... , nsamples
       time_start = mytime()
       sum = 0
       For i = 1, ... ,  noperations
             sum = sum + 1
       End for
       time_required = mytime() - time_start
       Write out time required
   End for
If roughly half of the times are nonzero, then the smallest positive time is probably the clock resolution. The setting of noperations is machine dependent - I would recommend starting out with 100, but don't be suprised if you need to use a value like 100000 or larger - a fast machine and low resolution clock will require many more operations. Also, on some modern systems the clock has high enough resolution that even setting noperations = 1 will not cause any zero timings to appear. If that is the case, overhead dominates and you can count yourself lucky. But you can still estimate the resolution using the measured timings.
Warning: do not trust the manual pages or vendor's claims about clock resolution. They lie blatantly, often, and without shame. Sometimes to cover up weaknesses, and sometimes just to avoid headaches. E.g., "Posix requires 1/100 second resolution, our clock has nanosecond resolution, so if we just leave the documentation saying 1/100 we're safe and won't have obnoxious CS professors complaining that they're only measuring 3 nanosecond resolution."

Timer Overhead Measurement

How much does calling the timer itself perturb results? Calling a timer is a function call, which involves some work: pushing the state of the process onto a stack, loading registers, and reading data (the fields in memory where the timer data is kept) which may in turn involve a page fault or cache miss. We can test this timer overhead just by calling the timer many times. More useful is to measure it in terms of a common operation, as is done in the LAPACK project. The vector update (or daxpy) operation is

y = y + alpha * x,

where x,y are double precision vectors of length n, and alpha is a double precision number. Computing it takes 2n floating point operations (flops). Typically performance for the daxpy is measured in Mflops, million floating point operations per second.

   time_start = mytime()
   for k = 1, ... , repetitions
       y = y + alpha * x
   end for
   t1 = mytime() - time_start
   Mflops = (2.0e-6)*n*repetitions/t1

   time_start = mytime()
   for k = 1, ... , repetitions
       time = mytime()
       y = y + alpha * x
   end for
   t2 = mytime() - time_start

   timer_overhead = (t2 - t1)/repetitions
   cost_in_flops = 1.0e6 * Mflops * timer_overhead
In the above algorithm, note that the daxpy operation is denoted as y = y + alpha * x, but in most programming languages a loop will be needed to carry it out. Also note that the Mflops rate is computed where it actually includes the cost of a single timer call overhead - we assume that timer overhead is small compared to the timing loop cost, which can be assured by making the number of repetitions sufficiently large.

The above overhead measurement is not ideal. As we will see later, the computational rate for various kernel operations like the daxpy can vary greatly. So if the overhead is 700 flops for the daxpy operation, it could at the same time be 465 flops for an inner product between two vectors. Nevertheless, it gives an idea of the order of magnitude involved.

Note: as we go through the semester you will be timing many things. And you will be tempted to ask me "How many repetitions do I need to make to get reliable timings?" Resist that temptation. All of the information you need to answer that question is available here, and any additional needed can only be found by running a few short tests. Scientific computing is an experimental field of research. You should raise questions like that, but whenever the question can be answered by running a quick experiment or two, do so.