ARC Accounting

Accounting

ARC is keeping an account of computing resources usage, and this activity covers all jobs from all ARC users.  This page explains

  • what "usage" exactly means;
  • how ARC accounts for that usage;
  • how usage relates to charging and
  • how system time is allocated to projects and users.

The ARC Accounting Method

A job that runs on ARC systems use up "credits".  A credit is equal to running 1 processing core for 1 second of wall clock time, and is independent from how fast the processing core is.  An alternative way to express usage is to use "core hours".  A core hour is the equivalent of 1 processing core being used for 1 hour of wall clock time, so

3600 credits = 1 core hour

The amount of credits that are "consumed" by a single job depends on a) the amount of resources that are allocated to a job and b) the duration of the job.  On the ARC systems, the job schedulers allocate resources in node exclusive mode, which means that jobs are allocated resources in units of whole nodes.  Thus, even if a job only requests a single processor core, it will be allocated and charged for a whole node.  The main reason for this type of resource allocation is to guarantee the performance of the application run by the job on that node.  The consequence of exclusive allocation is that it is the responsibility of the user to ensure each node which is allocated is fully used by the job and the application running.

The amount of credits which a job consumes are then calculated thus:

 consumption = (duration of job) x ((number of nodes allocated to the job) x (number of processor cores per node))

The duration of the job is the real, "wall clock" duration, and not CPU time as measured by the operating system.  The duration of a job is measured to the nearest second.  The compute nodes on the two main ARC systems (Arcus-B and Arcus-A) have 16 processor cores in each compute node.

If for example, a job ran for ten hours on one compute node of Arcus-B, the number of credits consumed would be:

(10 X 3600)  x (1 X 16) = 576,000 credits

To reiterate the important point about exclusive allocation, the same number of credits would be charged regardless of whether a single core or all 16 were requested.

Some applications are simply serial (1 single process with 1 thread of execution) and cannot be changed.  In that case, in order to utilise the cores available on a node, it is possible to "pack" more than one process of the same application inside a single job executing on a single node.  This mechanism is described here, and works extremely well to parallellise workloads in which the same application is mapped to a large number of cases, e.g. a Monte-Carlo simulation or a parameter sweep.

GPU Accounting

GPU processing is charged at the rate of 8 normal CPU cores each.  This results from the fact that there the GPUs on Arcus-B are 2 on a compute node, and each node has 16 cores in total.  Therefore, using one GPU is considered equivalent to occupying 8 cores for the duration of the GPU processing.

For example running on a single GPU for ten hours would consume:

(10 X 3600)  x (1 X 8) =  288,000 credits

How Credits are Managed

ARC uses an accounting package called GOLD.   This is coupled with some homegrown extensions and hooks into the job schedulers.  By default, users consume credits from a pool belonging to their project when they run jobs. 

Every user can check how many credits they have available with the mybalance command.  Some typical output of this command is shown below:

prompt> mybalance
Please wait: Calculating balance ...
Your are a member on the following project(s): dept-proj
and your total balance is: 999871360 credits (~277742hours)
 
Detailed account balance:
Id Name      Amount    Reserved Balance   CreditLimit Available
-- ------    --------- -------- --------- ----------- ---------
51 dept-proj 999871360 0        999871360 0           999871360

In the above example, the user can access the credits of the "dept-proj" ARC account, to which the user belongs.  The amount of credits available to the project is retrieved from the GOLD accounting and printed.

Another useful command is sacct, provided by the Slum job scheduler to print information about jobs.  To find elapsed time for a job you are running, you can use the command:

sacct --job=jobnumber --format=JobID,elapsed

This number printed by sacct has to be converetd to hours (divide by 3600).

What to do when credits are exhausted?

Every new project on ARC is given an initial amount of credits, worth 25,000 core-hours of computing.  While this is normally sufficient for testing and benchmarking an application, and even do a few production runs, it usually prove not enough for carrying out all the intended computational work.  Individual users (or the ARC account holders) can request additional credits at any time; please refer to this page to find out the details on how to request more credits.