Cluster Usage
This page will go over the basic structure of a cluster, available machines at
IU and in the CS department, how to use one of those machines in particular
(Odin), and rules of thumb for using a cluster politely.
Available Clusters
There are a number of compute resources available to you as students at IU.
This section describes a few of the clusters plus where to find more info.
IU Clusters
General information about compute resources and how to gain access: IU HPS
| Name |
Compute Nodes |
Cores/node |
Mem/node |
Networks |
Batch Scheduler |
| Big Red |
768 |
4 |
8GB |
Myricom |
LoadLeveller |
| Quarry |
112 |
8 |
8GB |
Ethernet |
Torque(PBS) |
CS Dept RI Clusters
There are a number of machines that were purchased with grant money. More
detailed information about the machines and how to gain access can be found in
the CS Facilities FAQ.
All of the CS clusters use the SLURM batch scheduler. All the information
below is based on the information in the FAQ. Warning: some of it may be outdated.
| Name |
Compute Nodes |
Cores/node |
Mem/node |
Networks |
OS |
| Odin |
128 |
4 |
4GB |
Ethernet, Infiniband |
RHEL4 |
| Sif |
8 |
8 |
16GB |
Ethernet, Infiniband, Myrinet, Quadrics |
RHEL5 |
| Tyr |
16 |
4 |
16GB |
Ethernet |
RHEL4 |
| Jord (SMP) |
1 |
8 |
64GB |
Ethernet |
RHEL5 |
| Idun (SMP) |
1 |
16 |
16GB |
Ethernet |
RHEL5 |
How A Cluster Works
Essentially, a cluster is a bunch of compute resources tied together with a
network(s) and software so that it looks like one machine from the outside.
Another aspect is that they are typically shared resources. This is where the
software comes in. A batch scheduler handles requests for nodes and fulfills
them based on availability and how the scheduler was set up. When there are
not enough resources for a submitted job, it is queued. There also may be
multiple queues to which you can submit a job for different purposes.
Typically, each queue only deals with a subset of nodes that does not overlap
with other queues. Each queue can have different restrictions. For instance a
debug queue may limit the number of nodes to something small, and have a short
time limit of a number of hours to allow users to develop and test codes with a
fast turnaround, whereas the production queue will allow for large codes to run
on many processors for long periods of time (days).

from link

from link
Using Odin: The Basics
Logging in:
The beauty of computing these days is that you can do it from anywhere, as long
as you have a decent internet connection. Since a cluster looks like a single
machine from the outside, we must first login to the head node.
To login to the odin head node:
ssh odin.cs.indiana.edu -l username
You will then be prompted for your password, unless you have done something
cool with your ssh keys.
Setting up your environment:
Now that you have logged on for the first time you will need to set up your
environment. Many parallel machines use modules to make compiling and running
applications easy. Modules allow you to set up your environment with all the
necessary paths, environment variables and compilers to use a particular
package. For example, if you want to use Python, then you just load the module
for Python that is available on the machine and it makes sure that all the
paths match the particular version associated with that module.
There are three common commands you need to know about modules:
- module avail: lists the modules that are available on the machine.
odin:> module avail
------------------------ /usr/share/Modules/modulefiles ------------------------
csg intel/mkl/10.0.3.020
csg.use.own module-cvs
dot module-info
intel/cc/10.1.015 modules
intel/cce/10.1.015 mpi/openmpi_gcc-1.2.6(default)
intel/fc/9.0 null
intel/fc/9.1.052 use.own
intel/fce/9.1.052
------------------------ /usr/share/Modules/modulefiles ------------------------
csg intel/mkl/10.0.3.020
csg.use.own module-cvs
dot module-info
intel/cc/10.1.015 modules
intel/cce/10.1.015 mpi/openmpi_gcc-1.2.6(default)
intel/fc/9.0 null
intel/fc/9.1.052 use.own
intel/fce/9.1.052
- module load module_name: loads the module for use
odin:> module load mpi
odin:> module list
Currently Loaded Modulefiles:
1) csg.use.own 4) intel/mkl/10.0.3.020
2) intel/cc/10.1.015 5) csg
3) intel/fc/9.1.052 6) mpi/openmpi_gcc-1.2.6
- module list: lists the modules you have loaded
odin:> module list
Currently Loaded Modulefiles:
1) csg.use.own 3) intel/fc/9.1.052 5) csg
2) intel/cc/10.1.015 4) intel/mkl/10.0.3.020
- module unload module_name: removes the module
odin:> module unload mpi
odin:> module list
Currently Loaded Modulefiles:
1) csg.use.own 3) intel/fc/9.1.052 5) csg
2) intel/cc/10.1.015 4) intel/mkl/10.0.3.020
Using Odin: Job Launch
Before you start grabbing nodes to use, it is a good idea to see who else is
using the machine. To do this you use the squeue command. It will
list the people who are logged on and using nodes, how many nodes are being
used for each job, and how long they have been used.
odin:> squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
39881 batch BM.NPB.s lee212 R 1-17:34:22 8 odin[001-008]
Now you can obtain an appropriate number of nodes to work on. There are two
modes that you can work in: interactive and batch.
Interactive Allocation:
In interactive mode, you can interact with the shell like you do on the head
node, but you have access to compute nodes too. The reason it is interactive
is because you are asking to execute a shell instead of a different executable.
- Ask for X nodes using salloc -N X $SHELL
odin: 81 > salloc -N 5 csh
salloc: Granted job allocation 40025
odin: 1 >
- You are now in a brand new shell but still on the head node. See what nodes you were allocated.
odin: 1 > squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
39881 batch BM.NPB.s lee212 R 1-17:40:07 8 odin[001-008]
40025 batch csh ssfoley R 0:08 5 odin[009-014]
- ssh to one of the nodes.
odin: /san/fsp/foley/tutorial 2 > ssh odin009
ssfoley@odin009's password:
odin009: ~ 1 >
- You can now run parallel programs on your nodes.
odin009: /san/fsp/foley/tutorial 13 > ls
chatter.c hw hw.c make-machinelist to_tmp.c
odin009: /san/fsp/foley/tutorial 5 > mpicc hw.c -o hw
odin009: /san/fsp/foley/tutorial 14 > mpirun -np 20 hw
Greetings from process 1!
Greetings from process 2!
Greetings from process 3!
Greetings from process 4!
Greetings from process 5!
Greetings from process 6!
Greetings from process 7!
Greetings from process 8!
Greetings from process 9!
Greetings from process 10!
Greetings from process 11!
Greetings from process 12!
Greetings from process 13!
Greetings from process 14!
Greetings from process 15!
Greetings from process 16!
Greetings from process 17!
Greetings from process 18!
Greetings from process 19!
Batch Allocation:
Batch allocation allows you to submit a script that launches the executeable(s)
for you. This is how clusters are used most. To do this use the sbatch command.
To use the sbatch command on the command-line you need a script that contains
the launch command (typically mpirun). An example for the hello world program
is:
#!/bin/sh
echo "$SLURM_NODELIST"
echo "$SLURM_TASKS_PER_NODE"
echo "$SLURM_NPROCS"
mpirun -np "$SLURM_NPROCS" ./hw
Here is the script in action.
odin: /san/fsp/foley/tutorial 82 > squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
39881 batch BM.NPB.s lee212 R 1-18:01:11 8 odin[001-008]
odin: /san/fsp/foley/tutorial 86 > sbatch -n 20 runit
sbatch: Submitted batch job 40026
odin: /san/fsp/foley/tutorial 87 > ls
chatter.c hw hw.c make-machinelist runit slurm-40026.out to_tmp.c
odin: /san/fsp/foley/tutorial 88 > cat slurm-40026.out
odin[009-013]
4(x5)
20
Greetings from process 1!
Greetings from process 2!
Greetings from process 3!
Greetings from process 4!
Greetings from process 5!
Greetings from process 6!
Greetings from process 7!
Greetings from process 8!
Greetings from process 9!
Greetings from process 10!
Greetings from process 11!
Greetings from process 12!
Greetings from process 13!
Greetings from process 14!
Greetings from process 15!
Greetings from process 16!
Greetings from process 17!
Greetings from process 18!
Greetings from process 19!
You will notice that the output is directed to a file called
"slurm-40026.out". This is the default output file, "slurm" + job id +
".out". You can change the output using a flag.
odin: /san/fsp/foley/tutorial 90 > sbatch -n 20 -o blah runit
sbatch: Submitted batch job 40027
odin: /san/fsp/foley/tutorial 91 > cat blah
odin[009-013]
4(x5)
20
Greetings from process 1!
Greetings from process 2!
Greetings from process 3!
Greetings from process 4!
Greetings from process 5!
Greetings from process 6!
Greetings from process 7!
Greetings from process 8!
Greetings from process 9!
Greetings from process 10!
Greetings from process 11!
Greetings from process 12!
Greetings from process 13!
Greetings from process 14!
Greetings from process 15!
Greetings from process 16!
Greetings from process 17!
Greetings from process 18!
Greetings from process 19!
The method of using command line flags can get tedious and is error-prone. Of
course there is another way. Using a batchscript:
#!/bin/sh
#SBATCH --time=10
#SBATCH -o my_outfile
echo "$SLURM_NODELIST"
echo "$SLURM_TASKS_PER_NODE"
echo "$SLURM_NPROCS"
mpirun -np "$SLURM_NPROCS" ./hw
Using a batchscript is useful for lots of things like preprocessing, gathering
metadata about a run, running multiple executables (serially), and post
processing.
Rules of Thumb
There are some things to keep in mind while using shared clusters. Mostly it
comes down to being polite and being aware of your code's needs and potential bugs.
- There are two types of use: testing and development, and production runs.
For testing and development, you want to use a smaller allocation and set
reasonable time and space limits. A bug in your program could get out of hand
and mess up the machine for everyone else. On many machines there are separate
queues for these different activities with default settings that are
appropriate. (This is not the case on Odin. There is only one queue and the
default time is approx. 3 days.)
- Clean up after yourself. Most of the time you will need data from all of
the processes that you are using. If all of the processes are creating output
(writing to files) then you should use local storage (/tmp). It is courteous
to other users because you are not using the NFS or parallel I/O system which
can slow down the whole machine. (This is particularly true on Odin!)
However, this means that you need to have a script or your parallel code
aggragate the data and put it somewhere you can access (on NFS) at the end of
the job or at least not too often. The /tmp directory can get full, so make
sure your batch script cleans up temporary files created during execution.
- When you are choosing a machine to use, consider the memory, interconnect
and processor needs. If you are working on development, a smaller, lightly
loaded machine will be best. For scaling and production runs, a large,
powerful machine might be best. For machines with long queues and applications
with long runtimes, consider setting the option to have an email sent to you
when the job completes.
- If you job will be particularly stressful on the machine, it is nice to
ask permission or at least give a heads up to the other users. Typically,
there is a mailing list for these types of communications as well as upgrades
and downtimes.
Created by Samantha Foley
Started: Feb 25 2009