Exercise 2, Computer Science A321


Overview

Most of the code for this assignment has already been written and you just need to modify it and carry out some explorations with it. The primary idea is to test two hypotheses:
  1. A function is much faster than a script, since functions are compiled into "machine code" the first time they are run. This is done by a special program called a "compiler", which analyzes the function and typically performs optimizations that take advantage of the machine's hardware characteristics.
  2. Loops are slow in Matlab compared to using vector operations.
As an example of the second item, the claim is essentially that
        for k = 1:1000
           x(k) = k;
        end
is much slower than
        x = 1:1000;
or
        x = linspace(1,1000,1000);
The last two are "vector operations" since they create or operate upon an entire vector in one statement, without having to index through the vector and set one entry at a time.

Instead of setting up a linear vector x as in the examples above, we'll time setting up a 2-dimensional array H with random values, using Matlab's rand() function. Note: the code will use a loop to gather multiple timings of the different versions so that a statistically meaningful conclusion can be drawn. The terminology of "version with loops" and "no-loop" or "vectorized version" refers just to the segment of code that creates the 2D array H, not to the loop over different timings in the driver script. Only the segments which set up H are timed. So loops

Since all of those are already written for you, this note about the terminology is just for your edification - the codes are correct already. (I think.)

While the ostenible goal is to check the performance issues associated with using or not using nloops, and with turning scripts into functions, equally important is to have an exemplar of how to carry out simple computations and graphics with experimental data. So you'll want to document this thoroughly, and for those with a lot of coding experience, you may want to split out the data acquisition from the data analysis in driver.m, and turn one or both into a function call. But this is not a required part of the assignment.

Files

The set of provided m-files is All five of the files are in the single tar file ex2_files.tar so you don't have to download each file separately.

driver.m calls the others, does the timings and processing of the timings. You should fill in driver.m with any missing items, especially xlabels, ylabels, titles, for the plots.

Also be sure to pay attention to my comments and notes in the driver script. One technique I used often is to define names for the four different methods and then to use those names instead of the integers 1-4 when indexing into the timing array. It is easier to see at a glance that

      timings(fnoloops,:)
is the vector of timings for the version that uses a function with no loops, than it is to look at
      timings(4,:)
which requires you to remember that "4" in this context means method number 4, and that it corresponds to the function version with no loops.

Questions

Many questions occur in doing this kind of testing. Some are
  1. Most immediate: are function versions significantly faster than the script versions?
  2. Second most immediate: are the no-loop versions significantly faster than the versions using loops?
  3. Are the conclusions about the relative speed of the four versions valid for other operations? E.g., what if Matlab's rand() function is replaced by Matlab's randn() function? What if instead of setting H to random, it is instead set to something like 1/(3 + sin(x) + cos(x))?
  4. Using Matlab's std() standard deviation function implicitly assumes the "errors" in the observations have a Gaussian distribution (AKA the "normal distribution"). Here "errors in the observations" would be the difference between each timing and the average value of all timings. Do the errors actually have a normal distribution? One way of looking into this is to generate a several thousands of timings for a single version (e.g., the version in functionnoloops.m), and then plot the histogram of errors. The resulting graph should have a bell-shaped curve for a sufficiently large number of observations. This is not a course in statistics so we won't perform more rigorous or sophisticated tests to see if it actually is a normal distribution, but a bell-shaped curve in the histogram would at least suggest we've done the right thing by taking mean and standard deviation.
  5. In the versions with loops, the array H has been "preallocated" by first setting it to be an array of all zeros. This is not strictly necessary, but does help Matlab by letting it allocate memory before the loops start, so it is creating the workspace needed just once. How much effect does this have? Answer this by creating two more versions, a script and function version that are identical to scriptloops.m and functionloops.m but omit the command "H = zeros(n,n);". Call the new m-files nzscriptloops.m and nzfunctionloops.m, and create a new driver m-file called nzdriver.m that will test all six versions of setting up H with rand().
That is not an exhaustive list of issues that are generated by this topic, but be sure to address at least those. On the last question, a meta-question is added (the reason I ask you to create a new driver script instead of just adding to the original one). How long/hard is it to add the two new versions and test for them? Notice that the changes in the non-driver m-files is negligible; just delete a single line from their progenitor versions. All of the work would be in modifying the driver script to get nzdriver.m, and of course to analyze the results.

If the original driver script is well-designed then it should be fast and easy to create the new driver script nzdriver.m, and most of the changes would consist of adding new lines to deal with the two new cases. Is this the case? How long (in terms of hours/minutes) does it take you to

Handin

Hand in all of the m-files, including your modified driver.m and the new nzdriver.m listed above. Your driver scripts should generate the plots needed, so don't bother printing those off - we should be able to see them just by running your codes. Have a 1-3 pages writeup with your answers and conclusions, and any new questions that emerged (even if you did not have a chance to address the new ones). Although the four-part breakdown of human time on the meta-questions may not be fully practical, at least give some rough idea of how long you spent on getting a nzdriver.m that works correctly.