For a vector v of length n, denote its constituent components by v1, v2, ..., vn. The 2-norm is
||v||2 = sqrt(sum(vi*vi)), i = 1, ..., n
Initialize the components to vi = 0.00314159265*i, i = 1, ..., n. That value is somewhat arbitrary, modulo the constraints
The main driver program should
so I should do
...
peak_possible = 6.4e9
n = 1.0e8
nflops = 2*n
[declare and allocate the variables, including the x vector and scalar alpha]
[initialize x]
...
nrepetitions = max(1, peak_possible/nflops)
start_time = gettime()
for repetition = 1 to nrepetitions
alpha = norm2(x, n)
end
end_time = gettime()
elapsed_time = end_time - start_time
Mflop_rate = (1.0e-6)*nrepetitions*nflops/elapsed_time
[output elapsed_time and Mflop_rate]
...
Beware that the value 6.4e9 cannot be held in a 32-bit integer, and instead peak_possible should be a long int, integer*8, or whatever 64-bit integers are called in the programming language you use. This driver code can be the template for other performance tests, so I'd recommend making all of the integers 64-bit, not just peak_possible. Why did I need the max() function in computing nrepetitions?
Fortran users: the language does not require you to pass in the argument n in the invocation alpha = norm2(x, n), and it can be retrieved by an inquiry function applied to x. Still, explicitly pass it in as shown. This will be needed for future codes where a single large vector x is allocated and then values of n smaller than x's size are used.
double precision, allocatable :: v(:) = 0.314159265
.
.
.
v(1:n) = 0.314159265
In C/C++, the declaration of v should be something like
double v[n];
or
double *v;
C/C++ doesn't have vector assignment capabilities so it has
to use loops.