First project, Computer Science P573


Write a function that computes the vector 2-norm of an array of double precision floating point numbers, and a driver (main) program that times it for a vector of length 108 (or 107 if you find you cannot allocate such a long vecotr). Find the computational rate, and how it compares to the theoretical peak on the machine you used. Keep the norm function in a separate file, called norm2.c or norm2.f90 or more generally norm2.xxx, where xxx is for whatever compiler you use. This is the equivalent of what I did for the vector update operation.

For a vector v of length n, denote its constituent components by v1, v2, ..., vn. The 2-norm is

||v||2 = sqrt(sum(vi*vi)), i = 1, ..., n

Initialize the components to vi = 0.00314159265*i, i = 1, ..., n. That value is somewhat arbitrary, modulo the constraints

  1. Not all entries are equal to zero
  2. None of the entries are ± ∞ or NaN
  3. The values won't cause overflow (exceed the size representable by standards-compliant double precision numbers)
  4. The values won't encounter "denormalized" numbers, roughly speaking, numbers that are less than 1.0e-15 in size

The main driver program should

How many iterations should be made to get a reliable timing? The rule of thumb here is that the size of the timing block should be one second. My 3.2 Ghertz system has a theoretical peak rate of 6.4 Gflops/second (Gflops = 109 floating point operations). The summation in computing the 2-norm requires

2*n = 200 million = 2 * 108 flops

so I should do

(6.4 * 109) ÷ (2 * 108) = 32
repetitions of the norm calculation. Of course, modify the computation to fit whatever machine you are on. Using pseudo-code, the main/driver program should do something like

            ...
            peak_possible = 6.4e9
            n             = 1.0e8
            nflops        = 2*n
            [declare and allocate the variables, including the x vector and scalar alpha]
            [initialize x]
            ...
            nrepetitions  = max(1, peak_possible/nflops)
            start_time = gettime()
            for repetition = 1 to nrepetitions
                alpha = norm2(x, n)
            end
            end_time = gettime()
            elapsed_time = end_time - start_time
            Mflop_rate = (1.0e-6)*nrepetitions*nflops/elapsed_time
            [output elapsed_time and Mflop_rate]
            ...

Beware that the value 6.4e9 cannot be held in a 32-bit integer, and instead peak_possible should be a long int, integer*8, or whatever 64-bit integers are called in the programming language you use. This driver code can be the template for other performance tests, so I'd recommend making all of the integers 64-bit, not just peak_possible. Why did I need the max() function in computing nrepetitions?

Fortran users: the language does not require you to pass in the argument n in the invocation alpha = norm2(x, n), and it can be retrieved by an inquiry function applied to x. Still, explicitly pass it in as shown. This will be needed for future codes where a single large vector x is allocated and then values of n smaller than x's size are used.

Handing In

Send by email to p573@cs.indiana.edu the following:
For the last item, whatever results you get is ok.

Other Considerations




  1. Started : Mon 31 Aug 2009, 07:55 AM
  2. Modified: Mon 31 Aug 2009, 11:22 AM
  3. Last Modified: Wed 02 Sep 2009, 09:45 AM to correct the email address, explain the choice of initialization of v, and be more uniform in specified sizes