Intro Scientific Computing P573

Making Your Coding Life Easier

In most university coding, you rarely deal with large program management. Typically, you write code that resides in a small number of files, and modify and debug them until it seems to run correctly. Then you turn it in and hope you never see it again.

In reality (i.e., in companies and in research labs) most codes are split into many different files with different people working simultaneously on them. This brings up two needs: the first is to coordinate the updating and changing of files with others and is called version control. The second is to avoid having to (re)compile every source file when just one of them changes. More than one project in scientific computing has been bogged down because it ended up taking over 24 hours to recompile the program, when only one function or file had changed.

A compiler must be able to compile the files separately, and then link all of them together into a working program. When a file changes, it is necessary to compile that file, and any others that depend on it. There is a mechanism called "make" or "makefile" to keep track of this automatically. Next to a text editor, it is probably the most important tool in a coder's toolchest. make is the Unix version of this; other systems such as Microsoft Visual C++ usually come with similar program build facilities. If you use nmake under Windows and avoid dependencies on the MFC libraries, then usually your project can be shipped to a Unix system without too many headaches.

The Simplest Makefile

You can use makefile for many tasks other than compiling. Compiling is a complicated way to use it, so we'll start with something else. The UNIX "touch" command does the following. When you type

  touch blurb

then:

If the file "blurb" exists, then its timestamp is updated to the current time.
If it does not exist, an empty file named "blurb" is created with 0 length.

You can check the file and its timestamp using "ls -l" or your favorite variant, and it may help to do that as this intro proceeds. Here is a simple two-line makefile, with some annotation lines appended:

blurb:
	touch blurb
\______/          /\
 < TAB>           < RET>

There must be a tab character at the start of the second line before the word "touch". If you download a sample makefile using cut and paste from a browser, it almost always changes the tab to spaces, which will cause using the makefile to fail with some cryptic message. If it does not work, check that first! In general, copying using the mouse in X-windows to grab the text, or clipboard in Windows, or even some versions of ftp will convert tabs to spaces - so check that when you transfer a makefile from one place to another. For P573, try to transfer tar'd, gzipped, or zipped copies of a makefile to avoid this problem.

The make file must be named "makefile" or "Makefile" (actually, it can be called anything and then invoked using the -f option for the make command, but for now let's keep things simple). When you type "make" in the directory where the makefile lives, in the example above it will print

    touch blurb

and actually execute that same command. If "blurb" did not exist in your directory you will see that it now does - although as an empty file (check via the ls -l command which shows both the last update time and the size of the file). If you now type "make" again, you will see

    make: `blurb' is up to date.

When the file named in the first line exists and is "up-to-date", make recognizes nothing needs be done and ... doesn't do anything.

The Next Simplest Makefile

Now add some lines to the file "makefile":

    blurb: blob
	    touch blurb

    blob:
	    touch blob

and remove the file "blurb" from your directory. When you type "make" the output is

    touch blob
    touch blurb

which shows that "blob" was created first followed by "blurb". The way to read this is that

blurb depends on the existence of blob.
to create blob (which depends on nothing), you need to execute the command "touch blob".

If blob exists the above becomes

blurb depends on the existence of blob, and if blob is more recent than blurb, (as indicated by the files' time stamp) you need to execute the next step.
to create blob (which depends on nothing), you need to execute the command "touch blob".

Of course, the lines you added must have a tab in front of the word "touch", not just spaces. So if you tried it and it did not work ... reread the previous part of this page.

Makefile Rules

More generally, the makefile is grouped into "rules". Each rule looks like

    < target >: < dependent 1 > < dependent 2 > ....
	    < command >
	    < command >

When you type "make" with no arguments, a recursive update function is called on the target of the first rule in the file. Make can also be called with an argument, e.g. "make blob", which calls update on a different target. The update function could be written like this in pseudo-code

    update(target) {
    for (i in dependents(target)) do
        update(i)
    if ((exists(target) == false) or
        (date(target) < last_date(dependents(target))))
            execute commands
    }

In the example above the first call to make causes it to invoke update(blurb). The rule in the makefile for blurb has blob as a dependent. So update(blob) is called. Helper has no dependents, and it doesn't exist, so the command "touch blob" is executed. Then update(blob) returns back to update(blurb). Since blurb doesn't exist, it is created as well. Yep, that is the usual "explanation of a recursive function call" that is pretty darned weird, but if you parse it carefully ... never mind.

The second time we type "make", update(blurb) calls update(blob) again, but now blob exists and since it has no dependents, the command "touch blob" is not executed. It then returns back to update(blurb). Now blurb exists and it has a dependent, but the dependent was created slightly before it. So the date comparison fails and blurb is unchanged.

Now you figure out what would happens if you do:

"rm blob", and then "make"
"rm blurb", and then "make"
"touch blob", and then "make"
"touch blurb", and then "make"

Unless you are really cocky and know makefiles well, you should try the above in some directory to doublecheck your answers.

Makefile Targets

By default, a makefile's target is the first dependency, which is "blurb" in the dumb example. The command in the rule can be any Unix command. Usually it is a command that creates the target. e.g. in the rules for blurb and blob, the touch command actually created files blurb and blob, or updated their dates. Other targets can be defined, and a common example is a "clean" target, which cleans up the directory of all the debris created by the makefile:

    blurb: blob
	    touch blurb

    blob:
	    touch blob

    clean:
	    rm blurb blob

Since clean isn't in any list of dependents, it doesn't get called in any recursive update call. But if you type "make clean", make looks for the rule for "clean", notices that a file named "clean" does not exist, and executes the command "rm blurb blob". That causes blurb and blob to be deleted, but it doesn't cause a file named "clean" to be created. So the next time you type "make clean" it will still try to delete blurb and blob. "make clean" is a standard command to get rid of the individual binaries after a compiling several separate files, since they're usually not needed. The most common clean target is to remove object files and archives, as in "rm *.o *.a".

OK, if you are really cocky, what happens with this sequence:

    make
    touch clean
    make clean

The less trivial sample makefile has four explict targets: go, run, clean, and kleen. "run" depends on the executable "go", and simply executes it. "clean" removes the executable and object files, while "kleen" does the same but also removes any output files named "results".

Makefile Definitions

You can create definitions in the makefile, for the same reason you would make C preprocessor definitions (#define) in a C program. However, a quirk of the make system is that when you refer to the defined thingy, it must be preceded by a dollar sign $, and must be enclosed in parentheses (). Most other preprocessing systems only require that the defined quantity be preceded by the dollar sign, so look out for a mistake like using $CC instead of $(CC).

The sample makefile, compiles two files named elapsedtime.c and testtime.c, then links them together into an executable file named "go". Which C/C++ compiler to use is defined in the line

    CC = g++

which in turn means to use the g++ command that your shell recognizes (typically that will be /usr/bin/g++). If you need to use a different one, that line would be changed to (e.g.),

    CC = /usr/local/bin/g++

although now most installations provide the "Modules" system - IU CS does this, so do "man module" to see how to use them. The compiler options to use are specified by

    OPTS = -O3 -Wall

and which additional libraries to link in by

    LIBS = -lrt

and any required include file paths in the line

    INC = -I/usr/include

When specifying how to compile the codes, the strings

    $(CC) $(OPTS) $(INC) -c

are translated into the Unix command

    g++ -O3 -Wall -I/usr/include -c

The -c option to g++ says compile and create a .o file, but don't yet try linking everything together in an executable. That is done in the stanza that says how to create the executable file named "go". You can define anything you want, but note

Although you do not have to do so, software weenies say it is best to make the things you define in all capitals, so they can be spotted easier. [This is a convention I often violate, because hitting the shift key takes extra work.]
To use a defined quantity it must be proceeded by a dollar sign and enclosed in parentheses. Using $OPTS above would have failed.

Even for the small testtime example, a makefile is not overkill: it is far easier to just type "make" or "make run" a year from now, than it is a year or 10 from now to try to figure out what mystic invocation will compile the code. Using definitions also is a big help, even for programs with just a few files. Changing to a different compiler should just involve changing the definition in one place, instead hunting it down dozens of occurrences of "g++" and having to change them to "icc" or "xlc".

I normally put the definitions into a separate file named "make.inc", and then put the line

    include make.inc

at the top of the makefile. Notice that it is just "include", not "#include". In C/C++ codes, the pound sign (AKA octothorpe) triggers the C pre-processor. In makefiles and Fortran, "include" is a language statement and so does not need a special trigger prefacing it. In make the octothorpe is the comment indicator. Also notice that the file name is not enclosed in quote marks. You can put definitions anywhere in the makefile, as long as they are defined before being used.

Makefile Substitution Rules

... also known as "suffix rules". For P573 you don't really need to know this. Instead of having to specify (with two lines for each file) how to compile all 4000 functions that comprise your MegaBlaster program, you can give a suffix rule. For example,

    .C.o:
            $(CC) -c $(OPTS) $<

says that every file with a suffix of ".C" should have a corresponding object file with suffix ".o" created, and the way to do that is to apply the command

    g++ -c -O3 -Wall

on every file with suffix ".C". The " < " expands to allow this to be carried out for all files in the current directory with the corresponding suffix.

Beware that the while an explicit dependency definition would have the .o file dependent on the .C file, as in

    fred.o: fred.C

the suffix rule has the .C before the .o:

    .C.o:

Some automatically defined variables include

$@ = filename representing target
$% = filename element of an archive
$< = filename of first prerequisite
$? = names of all prerequisites (space separated) newer than the target
$^ = filenames of all prerequisites, without duplications
$* = stem of target filename

More Makefile Uses and Tweaks

Many people in scientific computing use makefiles for working on a paper using Latex - each section is put into a different file, and each is put into a makefile stanza. This is especially helpful if the paper is being jointly authored with multiple people working on it at the same time. But then version controls systems like subversion or git become vital, another tool anyone doing coding work in any area should learn. LaTeX typically requires multiple alternating invocations of the commands "latex" and "bibtex". A makefile can keep those straight without having to keep track of how many times and when bibtex should be executed. When writing a long document like a book or a Ph.D. dissertation, it can be broken down so that only the relevant parts need to be re-run through Latex.

Other tips about makefiles:

Because the makefile specifies all of the dependencies, it is trivial to run in parallel. Use the option "-j c" on a system with c cores, and make will run almost c times faster. The "almost" is because the dependencies prohibit totally perfect parallelism. If the system is not shared by several users, you can use a value of c larger than the number of cores. E.g., on a hex-core Intel i7 processor with hyperthreading, using make -j 18 to build gcc version 6.1 finishes 11 times faster than make alone.
More than one action can be triggered by a rule; e.g.,
```
    clean: 
        rm -f go *.o *.a
        echo "All tidied up, boss!" >> logfile
```
A tab must proceed the "echo", as with any specified action.
If a project spans multiple directories, put a makefile in each directory which compiles the required files and creates a library (.a or .ar) file. Have the overall makefile invoke the subdirectory makefiles only if the timestamp on the subdirectory has changed; that is, make the subdirectory the dependent specification as in
```
    overall: subdir1 subdir2 subdir3
```
Some software engineering folks will say that you should not do this, because it can lead to cascades and unnecessary actions being taken by make. But it does make your makefile system cleaner whenever each subdirectory creates its own library, and where changes in those subdirectories only occurs from modifications of the source code that require actions being taken anyway.
The "clean" targets listed will cause error or warning messages if the files to be cleaned do not exist, e.g., if no .o or .a files exist. Avoid that by using the -f option to rm:
```
    clean: 
        rm -f *.o *.a
```
If commands like rm or cc have been aliased or are likely to be aliased by other users, either use an explicit path like
```
    clean: 
        /bin/rm -f *.o *.a
```
or use an escape character to force the unaliased command to be used instead:
```
    clean: 
        \rm -f *.o *.a
```

The last two tips are not specific to the makefile system, but do handle issues that commonly occur in using makefiles.

Makefiles and Fortran

Fortran code has a peculiarity (ok, it has lots of peculiarities, but this one is important for makefiles): in general it does not use include files (and when it does, "include" is treated as a language statement, not a preprocessor directive). Instead declarations and definitions of variables that will be shared by multiple program units are put into "modules", which is a formal Fortran language entity. A module is a Fortran source code (typically with .f90, .f95, .f2k, .f2003, or .f2008 suffix) which when compiled creates the typical object .o file, and a second binary .mod file. Problems emerge because you must know which files need updating when a module's source changes.The problem is a "compilation cascade", basically a change to a module source that leads to large numbers of unnecessary re-compilation. The module's .mod files may not change when the source and/or .o files change. [Another headache is that the binary .mod file depends on which compiler is used].

The convention I use is the following:

make each source file containing a "use" statement for a module depend upon the .mod file
make the executable depend upon the .o file, and it links in that file
for each module file, use the 2-target form of a makefile dependency, viz.,
```
		kindlytypes.o kindlytypes.mod: kindlytype.f90
			$(F90) $(OPTS) -c kindlytypes.f90
	
```
That forces rebuilding both the .o and .mod file whenever the source file changes. Which means I'm willing to endure compilation cascades to assure correctness in the build and link phase, as long as the makefile's dependency specs are correct. Fortran 2008 introduced submodules in part to avoid these cascades, and partly to allow separation of interfaces and implementations. If you are using Fortran, look for John Reid's documents on the new features if F2008 for more info and explanation.

More on how to REALLY use makefiles

Nobody in their right mind ever writes a makefile from scratch. Instead, just copy one over from someone else or from another project, and edit it to handle the current project. That, and try to follow some basic rules to keep things from getting really strange.
All makefile targets should be files, or should be phony targets (like clean is in the example makefile). And all phony targets should be specified as such using a line like
```
.PHONY: clean run kleen
```
Use the -d flag to make to have it print out just what the sequence of tests it makes in determining what actions to take. That will also show you how it recursively takes steps when a dependency is found. This will also show that make tries about 400 different implicit rules for every file; turn that off by using the -r option, or by putting the line
```
MAKEFLAGS += r
```
into your makefile.
If you want to test a makefile, you can run "make -n" which will print the commands that would be executed, but does not actually execute them. That is something you should always test if you have a "clean" or "remove" target.
GNU make barfs badly on file names with spaces in them. Unlike most Unix-oid applications, you cannot just enclose the file name in quote marks to circumvent the problem. Use double backslashs to preceed any spaces in a file name, so that dumb file is instead typed into the make file as dumb\\ file. The best solution is "don't do that". If you have any control over the file names, make dumb file into dumb_file or dumbfile or anything without a space.
Any CS student should learn far more about makefiles than is included on this page, and should not need to be told there are many (and better) online sources. CS students should also learn about tools like autoconf, Eclipse, etc. One advantage of the make system is that it is not tied to a particular programming language or IDE, and is robust and widely available. However, in spite of its ancient and central place in the world of Unix applications and tools, make is a weirdo. For example, double backslashes are the escape mechanism for spaces and all the other special characters ($, ?, *, ...), except for the percent sign, which requires a single backslash to escape it. So \% is treated as the explicit character %, but \\% is interpreted as \%, that is, a backslash followed by a percent sign.

Started: in the 1980's
Modified: Sun 07 Sep 2014, 01:40 PM to add Fortran notes
Modified: Mon 20 Aug 2018, 06:55 AM formatting
Last Modified: Mon 20 Aug 2018, 07:03 AM