Initialize:
nsamples = 20
noperations = ???
For k = 1, ... , nsamples
time_start = mytime()
sum = 0
For i = 1, ... , noperations
sum = sum + 1
End for
time_required = mytime() - time_start
Write out time required
End for
If roughly half of the times are nonzero, then the smallest positive
time is probably the clock resolution.
The setting of noperations is machine dependent - I would recommend
starting out with 100, but don't be suprised if you need to use a value
like 100000 or larger - a fast machine and low resolution clock
will require many more operations. Also, on some modern systems
the clock has high enough resolution that even setting
noperations = 1 will not cause any zero timings to appear. If that
is the case, overhead dominates and you can count yourself lucky.
But you can still estimate the resolution using the measured timings.
y = y + alpha * x,
where x,y are double precision vectors of length n, and alpha is a double precision number. Computing it takes 2n floating point operations (flops). Typically performance for the daxpy is measured in Mflops, million floating point operations per second.
time_start = mytime()
for k = 1, ... , repetitions
y = y + alpha * x
end for
t1 = mytime() - time_start
Mflops = (2.0e-6)*n*repetitions/t1
time_start = mytime()
for k = 1, ... , repetitions
time = mytime()
y = y + alpha * x
end for
t2 = mytime() - time_start
timer_overhead = (t2 - t1)/repetitions
cost_in_flops = 1.0e6 * Mflops * timer_overhead
In the above algorithm, note that the daxpy operation is denoted
as y = y + alpha * x, but in most programming languages a loop will
be needed to carry it out. Also note that the Mflops rate is
computed where it actually includes the cost of a single timer
call overhead - we assume that timer overhead is small compared to
the timing loop cost, which can be assured by making the number
of repetitions sufficiently large.
The above overhead measurement is not ideal. As we will see later, the computational rate for various kernel operations like the daxpy can vary greatly. So if the overhead is 700 flops for the daxpy operation, it could at the same time be 465 flops for an inner product between two vectors. Nevertheless, it gives an idea of the order of magnitude involved.
Note: as we go through the semester you will be timing many things. And you will be tempted to ask me "How many repetitions do I need to make to get reliable timings?" Resist that temptation. All of the information you need to answer that question is available here, and any additional needed can only be found by running a few short tests. Scientific computing is an experimental field of research. You should raise questions like that, but whenever the question can be answered by running a quick experiment or two, do so.