Storage

Once your account has been created, you will immediately have access to two storage areas:

  1. Home (/home/projectname/username - $HOME)
  2. Data (/data/projectname/username - $DATA)

If you are unable to access either of these directories, please let us know.

Quota

By default your Home area will have a 15GB quota while the Data area will have a 5TB quota that is shared between yourself and the other members of your project. To check your usage of your Home or Data areas, you can use the pan_quota command. To check your Home area quota you would run:

pan_quota $HOME

and to check the usage of the Data area you would run:

pan_quota -G $DATA

where the "-G" option makes sure that combined usage of yourself and your project colleagues is returned.

This storage is for Live/operational data actively used on ARC. Any data that has not been touched in six months or more, by a user with current access to the system, will be removed. Please note your data might be linked to "active and current"  research, this is very different to the DATA being "actively used" on ARC, it is this second term that is being assessed and relevant to us in ensuring continuity of ARC storage.

We can provide more detailed statements of Data area quota usage to project leaders on request. Larger Data quotas (more than 5TB) are available on request as a chargeable service. Please contact ARC support for further information.

Performance

Both the Home and Data areas reside on a high performance parallel storage (Panasas) so the performance of both should be suitable for most circumstances. However, large I/O or metadata operations (such as listing the file names and other attributes in directories with lots of files) can still have serious impact on the file system so if you believe that this may be the case with your code, we ask that you contact us before running your jobs.

If your code inadvertently affects the file system for one reason or another, we may be forced to requeue your jobs and put a limit on the number you can run at any time. We will contact you if this becomes necessary.

Scratch

Scratch space is still available on arcus with the $TMPDIR environment variable. However the scratch folder resides on the same Panasas file system as $HOME and $DATA so there is limited benefit to using it. Scratch space is not currently available for arcus-b.

Backups

We do NOT currently create backups of data on the ARC shared file system (although the file system IS resilient to failures). We therefore strongly encourage you to keep copies of your data elsewhere, particularly when that data is critical to your research.

Snapshots

The ARC storage system has the ability to provide snapshots.  Snapshots have been configured on home directories.  Snapshots provide access to older versions of files.   This is useful if files have been accidentally deleted or overwritten.

For any directory on a volume with snapshots, there is a .snapshot directory which contains directories which indicate the date and time of the snapshot.  These directories contain versions of the files that existed at the time of the snapshot.  To list the snapshot directories that are available, type the following command:

ls .snapshot

Note, the .snapshot directory is not visible in normal directory listings.  Ie. "ls -a" doesn't list the .snapshot directory.

If you've accidentally deleted a file which existed earlier than the last daily snapshot, then you can retrieve the older copy from snapshot with the following command:

cd directory/where/file_has_been_deleted
ls .snapshot

Identify a snapshot directory you want to copy a file from, ie.2014.02.19.00.00.02/daily, then copy the file from the snapshot directory to the current location.

cp .snapshot/2014.02.19.00.00.02/daily/file .

Snapshots are taken weekly on Mondays at 4am. We keep a limited number of snapshots and in the event of an issue with the storage, old snapshots may be deleted and new snapshots may be prevented from being created.