Exercise 2, Computer Science A321
- Due: Wednesday, 13 Feb 2008
- Credit: 100 points
Overview
Most of the code for this assignment has already been written and you
just need to modify it and carry out some explorations with it.
The primary idea is to test two hypotheses:
- A function is much faster than a script, since functions are
compiled into "machine code" the first time they are run. This is
done by a special program called a "compiler", which analyzes the
function and typically performs optimizations that take advantage of
the machine's hardware characteristics.
- Loops are slow in Matlab compared to using vector operations.
As an example of the second item, the claim is essentially that
for k = 1:1000
x(k) = k;
end
is much slower than
x = 1:1000;
or
x = linspace(1,1000,1000);
The last two are "vector operations" since they create or operate upon
an entire vector in one statement, without having to index through the
vector and set one entry at a time.
Instead of setting up a linear vector x as in the examples above,
we'll time setting up a
2-dimensional array H with random values, using Matlab's rand() function.
Note: the code will use a loop to gather multiple timings of
the different versions so that a statistically meaningful conclusion
can be drawn. The terminology of "version with loops" and "no-loop" or
"vectorized version" refers just to the
segment of code that creates the 2D array H, not to the loop over different
timings in the driver script. Only the segments which set up H are
timed. So loops
- can occur in the driver.m file (mentioned below),
- are required in scriptloops.m and functionloops.m, and
- are not allowed in scriptnoloops.m and functionnoloops.m
Since all of those are already written for you, this note about the
terminology is just for your edification - the codes are correct already.
(I think.)
While the ostenible goal is to check the performance issues
associated with using or not using nloops, and with turning scripts
into functions, equally important is to have an exemplar of how to
carry out simple computations and graphics with experimental data.
So you'll want to document this thoroughly, and for those with
a lot of coding experience, you may want to split out the data
acquisition from the data analysis in driver.m, and turn one or
both into a function call. But this is not a required part of
the assignment.
Files
The set of provided m-files is
All five of the files are in the single tar file
ex2_files.tar so you don't have to download
each file separately.
driver.m calls the others, does the timings and processing of the timings.
You should fill in driver.m with any missing items, especially xlabels,
ylabels, titles, for the plots.
Also be sure to pay attention to my comments and notes in the driver script.
One technique I used often is to define names for the four different
methods and then to use those names instead of the integers 1-4 when
indexing into the timing array. It is easier to see at a glance that
timings(fnoloops,:)
is the vector of timings for the version that uses a function with no loops,
than it is to look at
timings(4,:)
which requires you to remember that "4" in this context means method number
4, and that it corresponds to the function version with no loops.
Questions
Many questions occur in doing this kind of testing. Some are
- Most immediate: are function versions significantly faster than
the script versions?
- Second most immediate: are the no-loop versions significantly faster than
the versions using loops?
- Are the conclusions about the relative speed of the four versions
valid for other operations? E.g., what if Matlab's
rand() function is replaced by Matlab's randn() function?
What if instead of setting H to random, it is instead set to something
like 1/(3 + sin(x) + cos(x))?
- Using Matlab's std() standard deviation function implicitly
assumes the "errors" in the observations have a Gaussian
distribution (AKA the "normal distribution"). Here "errors in
the observations" would be the difference between each timing
and the average value of all timings.
Do the errors actually have a normal distribution?
One way of looking into this is to generate a several
thousands of timings for a single version (e.g., the version
in functionnoloops.m), and then plot the histogram of errors.
The resulting graph should have a bell-shaped curve for a
sufficiently large number of observations. This is not a course
in statistics so we won't perform more rigorous or sophisticated
tests to see if it actually is a normal distribution, but
a bell-shaped curve in the histogram would at least suggest
we've done the right thing by taking mean and standard deviation.
- In the versions with loops, the array H has been "preallocated" by
first setting it to be an array of all zeros. This is not
strictly necessary, but does help Matlab by letting it allocate
memory before the loops start, so it is creating the workspace
needed just once. How much effect does this have? Answer this
by creating two more versions, a script and function version
that are identical to scriptloops.m and functionloops.m but
omit the command "H = zeros(n,n);". Call the new m-files
nzscriptloops.m and nzfunctionloops.m, and create a new driver
m-file called nzdriver.m that will test all six versions of
setting up H with rand().
That is not an exhaustive list of issues that are generated by this topic,
but be sure to address at least those. On the last question, a meta-question
is added (the reason I ask you to create a new driver script instead of just
adding to the original one). How long/hard is it to add the two new versions
and test for them? Notice that the changes in the non-driver m-files is
negligible; just delete a single line from their progenitor versions.
All of the work would be in modifying the driver script to get nzdriver.m,
and of course to analyze the results.
If the original driver script is well-designed then it should be fast and
easy to create the new driver script nzdriver.m, and most of the changes
would consist of adding new lines to deal with the two new cases. Is this
the case? How long (in terms of hours/minutes) does it take you to
- Figure out what is needed, and plan the modifications
- Modify the timing array and the loop over ntests
- Add or modify the plotting and graphics parts
- Debug the whole thing?
Handin
Hand in all of the m-files, including your modified driver.m and the
new nzdriver.m listed above. Your driver scripts should generate the plots
needed, so don't bother printing those off - we should be able to see
them just by running your codes. Have a 1-3 pages writeup with your
answers and conclusions, and any new questions that emerged (even if you
did not have a chance to address the new ones).
Although the four-part breakdown of human time on the meta-questions may
not be fully practical, at least give some rough idea of how long you spent on
getting a nzdriver.m that works correctly.