ARC Service Level Agreements

Background

The Advanced Research Computing (ARC) service is primarily supported through a Divisional Funding model. Under this model Divisions have agreed to provide funding to support the service ARC at a level which is directly linked to each Divison's usage of ARC's compute resources.

The level of usage by each Division is determined via a historical review of the compute time each Division has consumed over a three year rolling window. This information is then used to provide a forecast of the likely future usage in the next (i.e.. 4th) year. This then drives the percentage contribution that a Division will need to provide to the ARC service's operating budget.

Please note: Currently only the Mathematical, Physical and Life Science (MPLS), Humanities (Hum), and Social Sciences (SSD) Divisions are subscribed to the Divisional Funding model. Medical Sciences Division (MSD) does not currently subscribe to the Divisional Funding model. 

Although ARC's services are available free at the point of access to Researchers at the University, there are three Quality of Service levels that apply to ARC projects which are dependent on the ARC funding arrangements for your Division. ARC's principal quality of service levels are: Priority Access, Standard Access, and Basic Access.

Description of Quality of Service Levels

As noted above, the ARC service operates three Quality of Service (or QoS) levels. A QoS level defines the framework within which an ARC project and its associated users can run jobs on the ARC service; it defines parameters such as priority levels for jobs submitted by a project, the maximum length (wall clock time) of job that can be submitted, and the number of concurrent jobs that a user may have running on the cluster resources at any one time.

"Priority" QoS

This quality of service level is targeted at groups who require sustained amounts of compute time with a higher default job priority than is provided through the Standard level QoS. This level of QoS is only available through prior contact with the ARC team and is a paid for service which is in addition to any access gained to ARC via Standard or Basic QoS levels of access to the ARC Service.

Priority QoS credits must be purchased in advance and will be available for use by the purchasing project until they have been exhausted. When a project has exhausted its Priority credits then future jobs will need to run at the Standard QoS unless more credits are purchased. Projects reaching low levels of priority credit will be need to top up their accounts in good time should a continued need for priority access exist.

Please note: all jobs submitted to the ARC system should be set by users to run at Standard or Basic QoS unless Priority is required. This guards against purchased Priority credits being inadvertently used by a project user, and also allows project users the facility to save Priority credits for more urgent jobs, or when higher throughput may be needed.

In summary, Priority QoS is associated with projects that fund additional ARC access time directly from grants or other funding sources and is characterised by:

  • Jobs can be flagged to run as either Priority or Standard/Basic QoS to save Priority for urgent jobs or periods of high throughput;
  • Jobs being scheduled with the highest priority will move through the queues faster than Jobs with Standard or Basic QoS;
  • As with standard QoS jobs can be submitted to queues on the cluster services;
  • Priority use is not governed by the Fair Share algorithm;

"Standard" QoS

The Standard QoS level applies to all ARC Projects and users whose home Department is based within the MPLS, SSD, or Humanities’ Divisions. The Standard QoS level of access is paid for under the divisional charging model described previously (see Background) and as such, Standard access is free at the point of use for projects within these three divisions.

In summary, Standard QoS is associated with all projects where a Principal Investigator’s home Department is based within one of the above three Divisions covered by the Divisional funding model. This allows free use at the point of access and the following scheduling attributes:

  • Jobs will have a lower (Standard) prioritisation than jobs with the higher Priority QoS and will therefore be scheduled around Priority QoS jobs;
  • Jobs will have access to all available submission queues within the ARC service;
  • Users will be able to submit multiple jobs to tall queues/partitions on the cluster and have the ability to have multiple concurrent jobs running on the service;

Overall usage of the system will be governed and controlled through the scheduler “Fair Share” policy which will act to prevent the overuse of compute resources by any individual or group;

"Basic" QoS

The Basic QoS level applies to all non-funded projects at the University that lie outside of the Divisional funding model, and which do not utilise other external or internal funding sources to buy access to ARC. As of 2020/21 this QoS mainly applies to projects sponsored by Principal Investigators based within the MSD Division. However, Basic QoS recognises that a degree of access is required to ARC resources for researchers within MSD, and as such Basic QoS still provides free use at the point of access but under a more restrictive QoS than Standard QoS.

In summary, Basic QoS is associated with all projects where a Principal Investigator’s home Department is based within an MSD Department. Basic QoS allows free use at the point of access with the following scheduling attributes:

  • Jobs will have a lower prioritisation than jobs submitted with Standard QoS and will therefore be scheduled around Standard and Priority QoS jobs;
  • Users will only have the ability to submit jobs up to 24h in length;
  • Users will be able to submit multiple jobs BUT only have the ability to run one concurrent job on the clusters;
  • As with Standard access overall usage of the system will be governed and controlled through the scheduler “Fair Share” policy;

Costs of additional access

Although MSD users only have restricted Basic QoS access to the service; Standard and Priority access credits can be bought on research grants or from other internal funds with funds transferred to ARC via Inter-departmental Transfer (IDT). The credit unit used on the ARC service for CPU compute is the Core hour (CPU.h) or GPU hour (GPU.h) for GPU resources. The cost have been determined under the fEC TRAC methodology.

Under TRAC/fEC, in the 2020/21 Financial Year (Aug-July) the cost of a Standard QoS core hour is £0.01/Core.h. The cost of a Priority core hour currently stands at £0.02/Core.h. GPU resources are charged at £0.08/Core.h.

For further information please refer to here for Requesting Usage Credit. Please contact the ARC Support Team on support@arc.ox.ac.uk for any further questions.

Availability and planned maintenance

General availability

Every reasonable effort will be made to keep ARC resources available and operational 24 hours per day and 7 days per week.

Please note however that although the support personnel will do their best to keep the facility running at all times, we cannot guarantee to promptly resolve problems outside UK office hours, and during weekends and public holidays. Nevertheless, please notify support@arc.ox.ac.uk of issues whenever they arise.

Exceptional maintenance and unplanned disruptions

It may happen that despite best efforts, it becomes necessary to reduce or withdraw service at short notice and/or outside the planned maintenance time slot. This may happen e.g. for environmental reasons, such as air conditioning or power failure, or in an emergency where immediate shutdown is required to save equipment or data.

It is hoped that these situations will arise rarely. Obviously, in such cases service will be restored as rapidly as possible.

All ARC users are automatically subscribed to a low-traffic "arc-announce" mailing list which carries urgent announcements of any service disruption. Additionally the main IT Services status page here can be used to check the status of major issues.