Problem Set 1, CSCI A321/A521

Take the data set provided for finding a machine's cache size, do some simple tests, and make the improve the plotting. The files provided are all scripts (versus being functions).

I/O Speed:

Check the sizes of the data-only files. In class I showed that it was much faster to read in from a binary file (but did not really quantify it). Binary files with double precision numbers are usually smaller. The number represented by the string of characters "0.93023180467E+03" takes only 64 bits = 8 bytes to store, but the character string uses 17 characters, and each requires at least one byte. ["At least one" because ASCII representation uses one byte per character, but unicode (used for special symbols, accent marks, and non-English letter characters) takes more.] So here's the question: find the ratio of speeds to read in the data as ASCII characters versus binary. Find the ratio of the sizes of the files containing the ASCII versus binary representations. Are those ratios roughly the same? If so, it means the speed improvement is strictly a matter of the amount of bytes that have to be read in. If not, it means some other mechanism is at play.

Plotting:

Recall the conjecture that the drop off in speed occurs when the vector lengths are a power of two: n = 2^k for some integer k. Draw a black vertical line at each such value of n that occurs between the largest and smallest values of n tested. Just where the dropoff starts is not immediately obvious because of the non-monotonic nature of the data, so this is just for visual estimation. One way of doing this is to "hard-code" the necessary plotting stuff into the code by looking at the smallest (100) and largest (10000) values of n and setting up variables like
first_power_of_2  = 128;
second_power_of_2 = 256;
...
second_power_of_2 = 8192;
and then add them to the graph one by one. The problem with this is that it is not generalizable. If another data set has longer vector lengths, then you'd have to change your Matlab script file by hand and insert additional variables and lines.

So do this in a more generalizable way. Although the data I've provided starts at n = 100 and goes up to n = 10000, in general the three vector operations may have been tested for differing values of n in different orderings. That means you should not assume that saxpy(1,1) contains the minimum value of n that occurs, and saxpy(length(saxpy),1) has the maximum value. But you are guaranteed that the values of n are in the first columns of the performance rates data arrays. So the minimum n that occurs is

minimum_n = min(min(saxpy(:,1)), min(dotpxy(:,1)), min(dotpxx(:,1)))

The three "min"s in the expression above extracts the smallest n for the named array, and the final one takes the minimum of the resulting three numbers. Just plug in "max" where "min" appears above and you'll have the largest value of n that occurs. If it's not clear to you what is going on, break the operations down into steps to help yourself see what is happening. E.g.,

min_saxpy  = min(saxpy(:,1))
min_dotpxy = min(dotpxy(:,1))
min_dotpxx = min(dotpxx(:,1))
array_of_mins = [ min_saxpy, min_dotpxy, min_dotpxx ]
minimum_n = min(array_of_mins)

[And nope, that is not taking baby steps - I generally use and name a lot of intermediate variables for debugging purposes and to help me figure out just what I had in mind when I wrote the script m-file.] Now you can find the exponents of 2 needed, e.g.,

first_exponent = log2(minimum_n);
last_exponent  = log2(maximum_n);
That's now exactly what we need because for minimum_n = 100, Matlab will return
first_exponent = 6.6439
So you need to make sure that it is an integer, by either rounding it up to 7 or down to 6. Similarly for maximum_n = 10000,
last_exponent = 13.288

and you need to round it up or down (I'll let you decide which rounding direction is better). Matlab has two functions for this kind of rounding up or down: ceil() and floor().

Once you have the integer value of first_exponent and last_exponent, the range of values to plot have exponents = first_exponent:1:last_exponent. And the corresponding values of n would be

 powers_of_two = 2.^exponents;

Then for each entry e in the array powers_of_two, add a line between the points (e, 0) and (e, largest_Mflop_value), where I'm assuming you can figure out what largest_Mflop_value is from its name. And how to compute it via the max() function.

So all that hoo-haw is to get to answer the question "how big is my cache?", and tell me what it looks like from your final graph.

Refinements:

Let's turn this blob of code into something that can be reused later.
  1. Will your plotting script work when there are no powers of two between minimum_n and maximum_n? E.g. if the values tested were only the ones between n = 130 and n = 250. Check on that. As a hint, you can find out how many values are in the array exponents using either the length() or size() functions.
  2. The data set I've provides measurements for every value of n from minimum_n to maximum_n in steps of 1. Will your stuff work even if there is a nonunit "stride"? Using a stride of 4, 8, or 10 is common when you want to quickly get results.
  3. Similarly, does your stuff work if the values of n in saxpy(:,1) are not in strictly increasing order?
  4. What happens if multiple Mflops/sec measurements are provided for a single value of n? Notice that I'm not asking you to specifically handle this case, just give it a try and see what happens. For grins, and also because repeated measurements are common in science.
  5. Since it is powers of two that we are interested in, try a log-plot and see if it brings out the desired features better.
Answering the last set of questions will be easily tested if you create data sets with the given properties. For the first question, just read in the full data set and then

saxpy = saxpy(31:151, :); dotpxy = dotpxy(31:151, :); dotpxx = dotpxx(31:151, :);

Setting up the data set to test for the second question will require you to do some indexing ... and I won't give that fun thing away ... yet. For the third question, you can create a data set where the values of n are jumbled up using the randperm() function.

In any case, document what you've done via comments in the script file. We're going to hammer on this poor data set a bit more later on, so keeping all of the intermediate solutions and data set mangling might be useful then.

Finally: I estimate it will take you a combined time of eight hours to do this. But start early, and if you find you are stymied on the first part (putting in vertical lines) then ask for help. Preferably do so via the class web board, but you can ask other students more directly. Just be sure to report such ganging up, and you can turn in one solution set per team. Still, everyone should give this a go on their own ... and if you're not able to handle the first part within two hours, ask for help.


Hand-in:

Email your plotting m-file to a321@cs.indiana.edu. You can put your answers to the question into that m-file as comments, or put them onto a separate text file and mail it along with your plotting m-file.