Cluster Usage

This page will go over the basic structure of a cluster, available machines at IU and in the CS department, how to use one of those machines in particular (Odin), and rules of thumb for using a cluster politely.

Available Clusters

There are a number of compute resources available to you as students at IU. This section describes a few of the clusters plus where to find more info.

IU Clusters

General information about compute resources and how to gain access: IU HPS
Name Compute Nodes Cores/node Mem/node Networks Batch Scheduler
Big Red 768 4 8GB Myricom LoadLeveller
Quarry 112 8 8GB Ethernet Torque(PBS)

CS Dept RI Clusters

There are a number of machines that were purchased with grant money. More detailed information about the machines and how to gain access can be found in the CS Facilities FAQ. All of the CS clusters use the SLURM batch scheduler. All the information below is based on the information in the FAQ. Warning: some of it may be outdated.
Name Compute Nodes Cores/node Mem/node Networks OS
Odin 128 4 4GB Ethernet, Infiniband RHEL4
Sif 8 8 16GB Ethernet, Infiniband, Myrinet, Quadrics RHEL5
Tyr 16 4 16GB Ethernet RHEL4
Jord (SMP) 1 8 64GB Ethernet RHEL5
Idun (SMP) 1 16 16GB Ethernet RHEL5


How A Cluster Works

Essentially, a cluster is a bunch of compute resources tied together with a network(s) and software so that it looks like one machine from the outside.
from link from link


Another aspect is that they are typically shared resources. This is where the software comes in. A batch scheduler handles requests for nodes and fulfills them based on availability and how the scheduler was set up. When there are not enough resources for a submitted job, it is queued. There also may be multiple queues to which you can submit a job for different purposes. Typically, each queue only deals with a subset of nodes that does not overlap with other queues. Each queue can have different restrictions. For instance a debug queue may limit the number of nodes to something small, and have a short time limit of a number of hours to allow users to develop and test codes with a fast turnaround, whereas the production queue will allow for large codes to run on many processors for long periods of time (days).


from link




from link



Using Odin: The Basics


Logging in:

The beauty of computing these days is that you can do it from anywhere, as long as you have a decent internet connection. Since a cluster looks like a single machine from the outside, we must first login to the head node.

To login to the odin head node:
ssh odin.cs.indiana.edu -l username

You will then be prompted for your password, unless you have done something cool with your ssh keys.

Setting up your environment:

Now that you have logged on for the first time you will need to set up your environment. Many parallel machines use modules to make compiling and running applications easy. Modules allow you to set up your environment with all the necessary paths, environment variables and compilers to use a particular package. For example, if you want to use Python, then you just load the module for Python that is available on the machine and it makes sure that all the paths match the particular version associated with that module.
There are three common commands you need to know about modules:

Using Odin: Job Launch

Before you start grabbing nodes to use, it is a good idea to see who else is using the machine. To do this you use the squeue command. It will list the people who are logged on and using nodes, how many nodes are being used for each job, and how long they have been used.
odin:> squeue
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
  39881     batch BM.NPB.s   lee212   R 1-17:34:22      8 odin[001-008]
Now you can obtain an appropriate number of nodes to work on. There are two modes that you can work in: interactive and batch.

Interactive Allocation:

In interactive mode, you can interact with the shell like you do on the head node, but you have access to compute nodes too. The reason it is interactive is because you are asking to execute a shell instead of a different executable.
  1. Ask for X nodes using salloc -N X $SHELL
    odin: 81 > salloc -N 5 csh
    salloc: Granted job allocation 40025
    odin: 1 > 
    
  2. You are now in a brand new shell but still on the head node. See what nodes you were allocated.
    odin: 1 > squeue
      JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
      39881     batch BM.NPB.s   lee212   R 1-17:40:07      8 odin[001-008]
      40025     batch      csh  ssfoley   R       0:08      5 odin[009-014]
    
  3. ssh to one of the nodes.
    odin: /san/fsp/foley/tutorial 2 > ssh odin009
    ssfoley@odin009's password: 
    odin009: ~ 1 > 
    
  4. You can now run parallel programs on your nodes.
    odin009: /san/fsp/foley/tutorial 13 > ls
    chatter.c  hw  hw.c  make-machinelist  to_tmp.c
    odin009: /san/fsp/foley/tutorial 5 > mpicc hw.c -o hw
    odin009: /san/fsp/foley/tutorial 14 > mpirun -np 20 hw
    Greetings from process 1!
    Greetings from process 2!
    Greetings from process 3!
    Greetings from process 4!
    Greetings from process 5!
    Greetings from process 6!
    Greetings from process 7!
    Greetings from process 8!
    Greetings from process 9!
    Greetings from process 10!
    Greetings from process 11!
    Greetings from process 12!
    Greetings from process 13!
    Greetings from process 14!
    Greetings from process 15!
    Greetings from process 16!
    Greetings from process 17!
    Greetings from process 18!
    Greetings from process 19!
    

Batch Allocation:

Batch allocation allows you to submit a script that launches the executeable(s) for you. This is how clusters are used most. To do this use the sbatch command.
To use the sbatch command on the command-line you need a script that contains the launch command (typically mpirun). An example for the hello world program is:
#!/bin/sh
echo "$SLURM_NODELIST"
echo "$SLURM_TASKS_PER_NODE"
echo "$SLURM_NPROCS"
mpirun -np "$SLURM_NPROCS" ./hw

Here is the script in action.
odin: /san/fsp/foley/tutorial 82 > squeue
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
  39881     batch BM.NPB.s   lee212   R 1-18:01:11      8 odin[001-008]
odin: /san/fsp/foley/tutorial 86 > sbatch -n 20 runit
sbatch: Submitted batch job 40026
odin: /san/fsp/foley/tutorial 87 > ls
chatter.c  hw  hw.c  make-machinelist  runit  slurm-40026.out  to_tmp.c
odin: /san/fsp/foley/tutorial 88 > cat slurm-40026.out 
odin[009-013]
4(x5)
20
Greetings from process 1!
Greetings from process 2!
Greetings from process 3!
Greetings from process 4!
Greetings from process 5!
Greetings from process 6!
Greetings from process 7!
Greetings from process 8!
Greetings from process 9!
Greetings from process 10!
Greetings from process 11!
Greetings from process 12!
Greetings from process 13!
Greetings from process 14!
Greetings from process 15!
Greetings from process 16!
Greetings from process 17!
Greetings from process 18!
Greetings from process 19!
You will notice that the output is directed to a file called "slurm-40026.out". This is the default output file, "slurm" + job id + ".out". You can change the output using a flag.
odin: /san/fsp/foley/tutorial 90 > sbatch -n 20 -o blah runit
sbatch: Submitted batch job 40027
odin: /san/fsp/foley/tutorial 91 > cat blah
odin[009-013]
4(x5)
20
Greetings from process 1!
Greetings from process 2!
Greetings from process 3!
Greetings from process 4!
Greetings from process 5!
Greetings from process 6!
Greetings from process 7!
Greetings from process 8!
Greetings from process 9!
Greetings from process 10!
Greetings from process 11!
Greetings from process 12!
Greetings from process 13!
Greetings from process 14!
Greetings from process 15!
Greetings from process 16!
Greetings from process 17!
Greetings from process 18!
Greetings from process 19!
The method of using command line flags can get tedious and is error-prone. Of course there is another way. Using a batchscript:
#!/bin/sh
#SBATCH --time=10
#SBATCH -o my_outfile
echo "$SLURM_NODELIST"
echo "$SLURM_TASKS_PER_NODE"
echo "$SLURM_NPROCS"
mpirun -np "$SLURM_NPROCS" ./hw
Using a batchscript is useful for lots of things like preprocessing, gathering metadata about a run, running multiple executables (serially), and post processing.

Rules of Thumb

There are some things to keep in mind while using shared clusters. Mostly it comes down to being polite and being aware of your code's needs and potential bugs.
  1. There are two types of use: testing and development, and production runs. For testing and development, you want to use a smaller allocation and set reasonable time and space limits. A bug in your program could get out of hand and mess up the machine for everyone else. On many machines there are separate queues for these different activities with default settings that are appropriate. (This is not the case on Odin. There is only one queue and the default time is approx. 3 days.)
  2. Clean up after yourself. Most of the time you will need data from all of the processes that you are using. If all of the processes are creating output (writing to files) then you should use local storage (/tmp). It is courteous to other users because you are not using the NFS or parallel I/O system which can slow down the whole machine. (This is particularly true on Odin!) However, this means that you need to have a script or your parallel code aggragate the data and put it somewhere you can access (on NFS) at the end of the job or at least not too often. The /tmp directory can get full, so make sure your batch script cleans up temporary files created during execution.
  3. When you are choosing a machine to use, consider the memory, interconnect and processor needs. If you are working on development, a smaller, lightly loaded machine will be best. For scaling and production runs, a large, powerful machine might be best. For machines with long queues and applications with long runtimes, consider setting the option to have an email sent to you when the job completes.
  4. If you job will be particularly stressful on the machine, it is nice to ask permission or at least give a heads up to the other users. Typically, there is a mailing list for these types of communications as well as upgrades and downtimes.

Created by Samantha Foley
Started: Feb 25 2009