Making Your Coding Life Easier

In most university coding, you rarely deal with large program management. Typically, you write code that resides in a small number of files, and modify and debug them until it seems to run correctly. Then you turn it in and hope you never see it again.

In reality (i.e., in companies and in research labs) most codes are split into many different files with different people working simultaneously on them. This brings up two needs: the first is to coordinate the updating and changing of files with others and is called version control. The second is to avoid having to (re)compile every source file when just one of them changes. More than one project in scientific computing has been bogged down because it ended up taking over 24 hours to recompile the program, when only one function or file had changed.

A compiler must be able to compile the files separately, and then link all of them together into a working program. When a file changes, it is necessary to compile that file, and any others that depend on it. There is a mechanism called "make" or "makefile" to keep track of this automatically. Next to a text editor, it is probably the most important tool in a coder's toolchest. make is the Unix version of this; other systems such as Microsoft Visual C++ usually come with similar program build facilities. If you use nmake under Windows and avoid dependencies on the MFC libraries, then usually your project can be shipped to a Unix system without too many headaches.


The Simplest Makefile

You can use makefile for many tasks other than compiling. Compiling is a complicated way to use it, so we'll start with something else. The UNIX "touch" command does the following. When you type

  touch blurb
then: You can check the file and its timestamp using "ls -l" or your favorite variant, and it may help to do that as this intro proceeds. Here is a simple two-line makefile, with some annotation lines appended:
blurb:
	touch blurb
\______/          /\
 < TAB>           < RET>
There must be a tab character at the start of the second line before the word "touch". If you download a sample makefile using cut and paste from a browser, it almost always changes the tab to spaces, which will cause using the makefile to fail with some cryptic message. If it does not work, check that first! In general, copying using the mouse in X-windows to grab the text, or clipboard in Windows, or even some versions of ftp will convert tabs to spaces - so check that when you transfer a makefile from one place to another. For P573, try to transfer tar'd, gzipped, or zipped copies of a makefile to avoid this problem.

The make file must be named "makefile" or "Makefile" (actually, it can be called anything and then invoked using the -f option for the make command, but for now let's keep things simple). When you type "make" in the directory where the makefile lives, in the example above it will print

    touch blurb
and actually execute that same command. If "blurb" did not exist in your directory you will see that it now does - although as an empty file (check via the ls -l command which shows both the last update time and the size of the file). If you now type "make" again, you will see
    make: `blurb' is up to date.
When the file named in the first line exists and is "up-to-date", make recognizes nothing needs be done and ... doesn't do anything.


The Next Simplest Makefile

Now add some lines to the file "makefile":

    blurb: blob
	    touch blurb

    blob:
	    touch blob
and remove the file "blurb" from your directory. When you type "make" the output is
    touch blob
    touch blurb
which shows that "blob" was created first followed by "blurb". The way to read this is that
  1. blurb depends on the existence of blob.
  2. to create blob (which depends on nothing), you need to execute the command "touch blob".
If blob exists the above becomes
  1. blurb depends on the existence of blob, and if blob is more recent than blurb, (as indicated by the files' time stamp) you need to execute the next step.
  2. to create blob (which depends on nothing), you need to execute the command "touch blob".
Of course, the lines you added must have a tab in front of the word "touch", not just spaces. So if you tried it and it did not work ... reread the previous part of this page.


Makefile Rules

More generally, the makefile is grouped into "rules". Each rule looks like

    < target >: < dependent 1 > < dependent 2 > ....
	    < command >
	    < command >
When you type "make" with no arguments, a recursive update function is called on the target of the first rule in the file. Make can also be called with an argument, e.g. "make blob", which calls update on a different target. The update function could be written like this in pseudo-code
    update(target) {
    for (i in dependents(target)) do
        update(i)
    if ((exists(target) == false) or
        (date(target) < last_date(dependents(target))))
            execute commands
    }
In the example above the first call to make causes it to invoke update(blurb). The rule in the makefile for blurb has blob as a dependent. So update(blob) is called. Helper has no dependents, and it doesn't exist, so the command "touch blob" is executed. Then update(blob) returns back to update(blurb). Since blurb doesn't exist, it is created as well. Yep, that is the usual "explanation of a recursive function call" that is pretty darned weird, but if you parse it carefully ... never mind.

The second time we type "make", update(blurb) calls update(blob) again, but now blob exists and since it has no dependents, the command "touch blob" is not executed. It then returns back to update(blurb). Now blurb exists and it has a dependent, but the dependent was created slightly before it. So the date comparison fails and blurb is unchanged.

Now you figure out what would happens if you do:

  1. "rm blob", and then "make"
  2. "rm blurb", and then "make"
  3. "touch blob", and then "make"
  4. "touch blurb", and then "make"
Unless you are really cocky and know makefiles well, you should try the above in some directory to doublecheck your answers.


Makefile Targets

By default, a makefile's target is the first dependency, which is "blurb" in the dumb example. The command in the rule can be any Unix command. Usually it is a command that creates the target. e.g. in the rules for blurb and blob, the touch command actually created files blurb and blob, or updated their dates. Other targets can be defined, and a common example is a "clean" target, which cleans up the directory of all the debris created by the makefile:

    blurb: blob
	    touch blurb

    blob:
	    touch blob

    clean:
	    rm blurb blob
Since clean isn't in any list of dependents, it doesn't get called in any recursive update call. But if you type "make clean", make looks for the rule for "clean", notices that a file named "clean" does not exist, and executes the command "rm blurb blob". That causes blurb and blob to be deleted, but it doesn't cause a file named "clean" to be created. So the next time you type "make clean" it will still try to delete blurb and blob. "make clean" is a standard command to get rid of the individual binaries after a compiling several separate files, since they're usually not needed. The most common clean target is to remove object files and archives, as in "rm *.o *.a".

OK, if you are really cocky, what happens with this sequence:

    make
    touch clean
    make clean

The less trivial sample makefile has four explict targets: go, run, clean, and kleen. "run" depends on the executable "go", and simply executes it. "clean" removes the executable and object files, while "kleen" does the same but also removes any output files named "results".


Makefile Definitions

You can create definitions in the makefile, for the same reason you would make C preprocessor definitions (#define) in a C program. However, a quirk of the make system is that when you refer to the defined thingy, it must be preceded by a dollar sign $, and must be enclosed in parentheses (). Most other preprocessing systems only require that the defined quantity be preceded by the dollar sign, so look out for a mistake like using $CC instead of $(CC).

The sample makefile, compiles two files named elapsedtime.c and testtime.c, then links them together into an executable file named "go". Which C/C++ compiler to use is defined in the line

    CC = g++
which in turn means to use the g++ command that your shell recognizes (typically that will be /usr/bin/g++). If you need to use a different one, that line would be changed to (e.g.),
    CC = /usr/local/bin/g++

although now most installations provide the "Modules" system - IU CS does this, so do "man module" to see how to use them. The compiler options to use are specified by

    OPTS = -O3 -Wall
and which additional libraries to link in by
    LIBS = -lrt
and any required include file paths in the line
    INC = -I/usr/include
When specifying how to compile the codes, the strings
    $(CC) $(OPTS) $(INC) -c
are translated into the Unix command
    g++ -O3 -Wall -I/usr/include -c 
The -c option to g++ says compile and create a .o file, but don't yet try linking everything together in an executable. That is done in the stanza that says how to create the executable file named "go". You can define anything you want, but note
  1. Although you do not have to do so, software weenies say it is best to make the things you define in all capitals, so they can be spotted easier. [This is a convention I often violate, because hitting the shift key takes extra work.]
  2. To use a defined quantity it must be proceeded by a dollar sign and enclosed in parentheses. Using $OPTS above would have failed.

Even for the small testtime example, a makefile is not overkill: it is far easier to just type "make" or "make run" a year from now, than it is a year or 10 from now to try to figure out what mystic invocation will compile the code. Using definitions also is a big help, even for programs with just a few files. Changing to a different compiler should just involve changing the definition in one place, instead hunting it down dozens of occurrences of "g++" and having to change them to "icc" or "xlc".

I normally put the definitions into a separate file named "make.inc", and then put the line

    include make.inc
at the top of the makefile. Notice that it is just "include", not "#include". In C/C++ codes, the pound sign (AKA octothorpe) triggers the C pre-processor. In makefiles and Fortran, "include" is a language statement and so does not need a special trigger prefacing it. In make the octothorpe is the comment indicator. Also notice that the file name is not enclosed in quote marks. You can put definitions anywhere in the makefile, as long as they are defined before being used.


Makefile Substitution Rules

... also known as "suffix rules". For P573 you don't really need to know this. Instead of having to specify (with two lines for each file) how to compile all 4000 functions that comprise your MegaBlaster program, you can give a suffix rule. For example,

    .C.o:
            $(CC) -c $(OPTS) $<
says that every file with a suffix of ".C" should have a corresponding object file with suffix ".o" created, and the way to do that is to apply the command
    g++ -c -O3 -Wall
on every file with suffix ".C". The " < " expands to allow this to be carried out for all files in the current directory with the corresponding suffix.

Beware that the while an explicit dependency definition would have the .o file dependent on the .C file, as in

    fred.o: fred.C
the suffix rule has the .C before the .o:
    .C.o:
Some automatically defined variables include


More Makefile Uses and Tweaks

Many people in scientific computing use makefiles for working on a paper using Latex - each section is put into a different file, and each is put into a makefile stanza. This is especially helpful if the paper is being jointly authored with multiple people working on it at the same time. But then version controls systems like subversion or git become vital, another tool anyone doing coding work in any area should learn. LaTeX typically requires multiple alternating invocations of the commands "latex" and "bibtex". A makefile can keep those straight without having to keep track of how many times and when bibtex should be executed. When writing a long document like a book or a Ph.D. dissertation, it can be broken down so that only the relevant parts need to be re-run through Latex.

Other tips about makefiles:

The last two tips are not specific to the makefile system, but do handle issues that commonly occur in using makefiles.


Makefiles and Fortran

Fortran code has a peculiarity (ok, it has lots of peculiarities, but this one is important for makefiles): in general it does not use include files (and when it does, "include" is treated as a language statement, not a preprocessor directive). Instead declarations and definitions of variables that will be shared by multiple program units are put into "modules", which is a formal Fortran language entity. A module is a Fortran source code (typically with .f90, .f95, .f2k, .f2003, or .f2008 suffix) which when compiled creates the typical object .o file, and a second binary .mod file. Problems emerge because you must know which files need updating when a module's source changes.The problem is a "compilation cascade", basically a change to a module source that leads to large numbers of unnecessary re-compilation. The module's .mod files may not change when the source and/or .o files change. [Another headache is that the binary .mod file depends on which compiler is used].

The convention I use is the following:


More on how to REALLY use makefiles