ARC Cluster Description

 
At the centre of the ARC service are two high performance compute clusters arc and and htc, connected via a single interface.  arc is the system designed for multi-node parallel computation, htc is designed for high-thoughput lower core count jobs. htc also has nodes with GPU cards installed for GPGPU computing and other novel resources. Users get access to both both clusters automatically as part of the process of obtaining an account with ARC, and can use either or both. 

 

For more detailled information on the hardware specifications of these clusters, see the tables below:

Cluster Description Login Node Compute Nodes Minimum Job Size Notes:
arc

Our largest compute cluster. Optimised for large parallel jobs spanning multiple nodes. Scheduler prefers large jobs.

Offers low-latency interconnect (Mellanox HDR 100).

arc-login

CPU: 48 core Cascade Lake (Intel Xeon Platinum 8268 CPU @ 2.90GHz)

Memory: 392GB

 

1 core

Non-blocking island size is 2212 cores

htc

Optimised for single core jobs, and SMP jobs up to one node in size. Scheduler prefers small jobs.

Also catering for jobs requiring resources other than CPU cores (e.g. GPUs).

htc-login

CPUs: mix of Broadwell, Haswell, Cacade Lake

GPU: P100, V100, A100, RTX

Novel architectures: KNL

1 core

Jobs will only be scheduled onto a GPU node if requesting a GPU resource.

 

 

Operating system

The ARC systems use the Linux Operating System (specifically CentOS 8) which is commonly used in HPC. We do not have any HPC systems running Windows (or MacOS). If you are unfamiliar with using Linux, please consider:

  • Finding introduction to Linux resources online (through Google/Bing/Yahoo etc).
  • Working through our brief Introduction to Linux course.
  • Atttending our Introduction to ARC training course (this does not teach you how to use Linux but the examples will help you gain a greater understanding).

GPU Resources

 

The htc cluster has a number of GPU nodes. Most of the are in the "htc" partition; the M40, P4 and Titan RTX cards are in the "htc-nova" partition.
The following table (containing data from http://www.nvidia.com/object/tesla-servers.html and https://developer.nvidia.com/cuda-gpus) describes the characteristics of each GPU card.
  Tesla K40 Tesla K80 Tesla M40 Tesla P4 Tesla P100 Tesla V100
Number and Type of GPU 1 Kepler GK110B 2 Kepler GK210 1 Maxwell GM200 1 Pascal GP104 1 Pascal GP100 1 Volta GV100
Memory size 12 GB 24 GB 24GB 8GB 16GB 16GB/32GB
ECC yes yes no yes yes yes
CUDA cores 2880 4992 3072 2560 3584 5120
CUDA Compute Capability 3.5 3.7 5.2 6.1 6.0 7.0

 

The following GPU nodes are available in Arcus-HTC. More information on how to access GPU nodes is available.

Feature name Nodes Count CPU/node info No. of Cores  Mem/node Cards/node GPU memory Nvlink Hostname[s] Stakeholder
K40 9 (+1 devel) 2 x Intel Xeon E5-2640 v3 @2.6GHz 16 cores 64 GB 2xK40m 12GB no arcus-htc-gpu[009-018] ARC
K80 4 2 x Intel Xeon E5-2640 v3 @2.6GHz 16 cores 64 GB 2xK80 12GB no arcus-htc-gpu[019-022] ARC
M40 1 1 x Intel Xeon E5-1650  @ 3.20GHz 12 cores 64 GB 1xM40 24GB no arcus-htc-gpu008 ARC
P4 1 1 x Intel Xeon E5-1650  @ 3.20GHz 12 cores 64 GB 1xP4 8GB no arcus-htc-gpu008 ARC
P100 5 2 x Intel Xeon Gold 5120 CPU @ 2.20GHz 28 cores 384 GB 4xP100 16GB no arcus-htc-gpu[001-005] Torr Vision Group
P100 1 2 x Intel Xeon E5-2640 v4 @ 2.40GHz 16 cores 128 GB 2xP100 12GB no arcus-htc-gpu023 Torr Vision Group
P100 1 1 x Intel Xeon E5-1660 v4 @ 3.20GHz 8 cores 128 GB 2xP100 16GB no arcus-htc-gpu029 ARC
TITAN RTX 3 Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz 8 cores 192GB 4xTITAN RTX 24GB pairwise arcus-htc-gpu[026-028] Applied Artificial Intelligence Lab
V100 2 2 x Intel Xeon Gold 5120 CPU @ 2.20GHz 28 cores 384 GB 4xV100 16GB no arcus-htc-gpu[006-007] Torr Vision Group
V100 1 2 x Intel Xeon E5-2698 v4 @ 2.20GHz 40 cores 512 GB 8xV100 16GB yes arcus-htc-dgx1v ARC
V100 2 2 x Intel Xeon Gold 5120 CPU @ 2.20GHz 28 cores 384 GB 4xV100 16GB yes arcus-htc-gpu[024-025] Dell UK
V100 2 2 x Intel Xeon Gold 5120 CPU @ 2.20GHz 28 cores 384 GB 4xV100 32GB yes arcus-htc-gpu[030-031] Torr Vision Group
V100-LS 5 2 x Intel Xeon E5-2698 v4 @ 2.20GHz 40 cores 512 GB 8xV100 LS 32GB yes arcus-htc-maxq[001-005] ARC

Please note: Some machines in the above table have been purchased by specific departments/groups and are hosted by the ARC team (see Stakeholder column for details). These machines are available for general use, but may have job time-limit restrictions and/or occasionally be reserved for exclusive use of the entity that purchased them.

NVidia DGX Max-Q

The new nodes are a version of the NVIDIA Volta DGX-1 32GB V100 Server (offering 8x NVLinked Tesla V100 32GB GPUs) using the slightly lower clock speed V100-SXM2-32GB-LS version of the Volta cards. The systems have 40 CPU cores (E5-2698 v4 @ 2.20GHz CPUs) and 512GB of system memory.

The plots below show typical benchmark results between the DGX1V and DGX Max-Q:

 

typical GROMACS benchmark results between the DGX1V and DGX Max-Qbenchmark results for tensorflow, DGX1V and DGX-MaxQ