Using the High Performance Storage System (HPSS)
The HPSS is a large multi-terabyte storage system consisting of
2 Terabytes of hard drives in the front and 500 Terabytes of
tape storage in the back end. Actually, there may be more storage
at the backend by the time you read this.
You should use the HPSS for storing large data that you
don't access often.
HPSS is provided by UITS,
and their storage group is responsible for
the management and handling of it. HPSS data is not backed up
(after all, it's already on archival storage) but UITS duplicates
it at IUPUI and IUB, providing a form of backup in case Bloomington
disappears from the face of the earth.
The following presumes you will use HSI, the direct interface to
HPSS. Two of the other methods, ftp and pftp, involve sending your
password in the clear over the network and are not recommended.
Another mechanism is to use
DFS (distributed file system) software, by which data is moved to/from the
HPSS tape storage system automatically. If you use DFS, beware the
difference between "mirrored" and "archived" filesets; the latter
does not allow you to access your files except through DFS.
In any case, I recommend using HSI. It requires just a single
client to be installed. If you want strong encryption of passwords,
you can also install kerberos - well, actually, you have to have
UITS install it. They do not trust users to handle that. A kerberos
client requires a fixed IP address and other restrictions which make
it unsuitable for using from a laptop or behind a firewall.
HSI is installed already on all of the CS department solaris and
linux boxes.
Process for getting and using an HPSS account:
- Request a Mass Data Storage account. Instructions for this are at
http://storage.iu.edu/mdss_start.html
- Get your DCE credentials started. It may be that UITS has already
mapped your netpass password to the DCE cell, but you can check/change
the password at
http://password.iu.edu/
- Download the necessary client from http://storage.iu.edu/hsi.html
The RedHat 7 client will still work under RedHat 8. It does not work
under RedHat 9. The client is called "hsi"
- Execute the "hsi" command and you should be prompted for a
"DCE Principal", which is just your username. Then you'll be
prompted for your password. This
is weakly encrypted and sent to the HPSS front end. At that time,
you can do a pwd and get something like
/.../dce1.indiana.edu/fs/mirror/k/a/kaddiddlehopper
where the /k/a/ will always be from the first two letters of your
login name, and of course your login is kaddiddlehopper
- Now you are ready to use the HSI commands detailed at
http://www.sdsc.edu/Storage/hsi/Doc/ch8.html
Those are similar to ftp but with more flexibility. Common ones:
put, delete, ls, get. This is much like a shell, so you can move
files, rename them, use wildcards, etc.
If you want to use other mechanisms (pftp, CFS) then go to the
starter kit for MDSS.
Two significant notes about HPSS:
- Since it is a tape storage device HPSS is better suited for small
numbers of large files than for large numbers of small files.
So tar your directories before shipping them over.
According to UITS, "optimal applications will store data in files
that are typically larger than 50MB".
- The metadata (file names, directory structure) is immediately
visible when you login to HSI, because it is stored on hard drive.
The actual data is written to tape, so it can take up to 2 minutes
to access a file. You will get a prompt that indicates the scheduler
has scheduled the retrieval, then that it is retrieving the files,
and finally a progress bar when "get" a file that has been on HPSS
for some time.