ARC Systems

home arctic

ARC Systems 

At the centre of the ARC service are two high performance compute clusters - 'arc' and 'htc'. arc is designed for multi-node parallel computation; htc is designed for high-thoughput operation (lower core count jobs). The htc is a more heterogeneous system offering different types of resources, such as GPGPU computing and high memory systems; node on the arc cluster are uniform. Users get access to both both clusters automatically as part of the process of obtaining an account with ARC, and can use either or both. 

Details on the system configuration are:

Capability cluster (arc)

The capability system - cluster name arc - has a total of 305 48 core worker nodes, some of which are co-investment hardware. It offers a total of 14,640 CPU cores.

All nodes have the following:

  • 2x Intel Platinum 8628 CPU. The Platinum 8628 is a 24 core 2.90GHz Cascade Lake CPU. Thus all nodes have 48 CPU cores per node.
  • 384GB memory
  • HDR 100 infiniband interconnect. The fabric has a 3:1 blocking factor with non-blocking islands of 44 nodes (2112 cores).

OS is CentOS Linux 8.1. Scheduler is SLURM.

Login node for the system is 'arc-login.arc.ox.ac.uk', which allows logins from the University network range (including VPN).

More details on the available nodes:

node count core count nodes stakeholder comments
45 2,160 arc-c[001-045] Quantum Hub

available for general use for short jobs (<12hrs)

8 384 arc-c[046-053] Earth Science available for general use for short jobs (<12hrs)
252 12,096 arc-c[054-305] ARC  

 

Throughput cluster (htc)

The throughput system - cluster name htc  - currently has 25 worker nodes, some of which are co-investment hardware. Note that additional nodes will migrate into this system in the coming weeks.

19 of the nodes are GPGPU nodes. More information on how to access GPU nodes is available.

2 of the nodes are High Memory nodes with 3TB of RAM.

OS is CentOS Linux 8.1. Scheduler is SLURM.

Login node for the system is 'htc-login.arc.ox.ac.uk', which allows logins from the University network range (including VPN).

Details on the nodes are:

Node Count CPU Cores Per Node memory per node # GPUs GPUs nvlink interconnect stakeholder notes
2 Intel E5-2640v3 (Haswell),  2.60GHz 16 64GB - - - - ARC  
2 Intel E5-2640v4 (Broadwell),  2.60 GHz 20 128GB - - - - ARC  
2 Intel Platinum 8628 (Cascade Lake), 2.90GHz 48 3TB - - - HDR 100 ARC  
8 Intel Platinum 8628 (Cascade Lake), 2.90GHz 48 384GB 2 V100 no HDR 100 Quantum Hub available for short jobs (<12hrs)
4 Intel Platinum 8628 (Cascade Lake), 2.90GHz 48 384GB 4 A100 no HDR 100 ARC  
6 Intel Platinum 8628 (Cascade Lake), 2.90GHz 48 384GB 4 RTX8000 no HDR 100 ARC  
1 AMD Epyc 7452 (Rome), 2.35GHz 64 1TB 4 A100 no - Wes Armour available for short jobs (<12hrs)

 

Legacy clusters

Arcus-B

Arcus-B is still available via login to 'arcus-b.arc.ox.ac.uk'. We advise all users of this system to move their workload to the arc cluster as soon as possible.

Arcus-HTC

The Arcus-HTC system is still available for use. We will be migrating most of it's GPGPU hardware to the new htc cluster in the coming weeks; please at least test your workloads on the new system.

Neither legacy cluster (Arcus-B, Arcus-HTC) will be available after August 2021.

GPU Resources

ARC has a number of GPU nodes. Most of the are in the "htc" cluster or the (deprecated) Arcus-HTC system waiting to be moved.

The following table (containing data from http://www.nvidia.com/object/tesla-servers.html and https://developer.nvidia.com/cuda-gpus) describes the characteristics of each GPU card.

  Tesla K40 Tesla K80 Tesla M40 Tesla P4 Tesla P100
GPU Architecture Kepler Kepler Maxwell Pascal Pascal
Memory size 12 GB 24 GB 24GB 8GB 16GB
ECC yes yes no yes yes
CUDA cores 2880 4992 3072 2560 3584
CUDA Compute Capability 3.5 3.7 5.2 6.1 6.0
  Tesla V100 Titan RTX Quadro RTX 8000 Tesla A100
GPU Architecture Volta Turing Turing Ampere
Memory size 16GB/32GB 24GB 48GB 40GB/80GB
ECC yes no yes yes
CUDA cores 5120 4606 4608 6912
CUDA Compute Capability 7.0 7.5 7.5 8.6

 

The following GPU nodes are available in Arcus-HTC; for details on new htc, see table above. More information on how to access GPU nodes is available.

Feature name Nodes Count CPU/node info No. of Cores  Mem/node Cards/node GPU memory Nvlink Hostname[s] Stakeholder
K40 9 (+1 devel) 2 x Intel Xeon E5-2640 v3 @2.6GHz 16 cores 64 GB 2xK40m 12GB no arcus-htc-gpu[009-018] ARC
K80 4 2 x Intel Xeon E5-2640 v3 @2.6GHz 16 cores 64 GB 2xK80 12GB no arcus-htc-gpu[019-022] ARC
M40 1 1 x Intel Xeon E5-1650  @ 3.20GHz 12 cores 64 GB 1xM40 24GB no arcus-htc-gpu008 ARC
P4 1 1 x Intel Xeon E5-1650  @ 3.20GHz 12 cores 64 GB 1xP4 8GB no arcus-htc-gpu008 ARC
P100 5 2 x Intel Xeon Gold 5120 CPU @ 2.20GHz 28 cores 384 GB 4xP100 16GB no arcus-htc-gpu[001-005] Torr Vision Group
P100 1 2 x Intel Xeon E5-2640 v4 @ 2.40GHz 16 cores 128 GB 2xP100 12GB no arcus-htc-gpu023 Torr Vision Group
P100 1 1 x Intel Xeon E5-1660 v4 @ 3.20GHz 8 cores 128 GB 2xP100 16GB no arcus-htc-gpu029 ARC
TITAN RTX 3 Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz 8 cores 192GB 4xTITAN RTX 24GB pairwise arcus-htc-gpu[026-028] Applied Artificial Intelligence Lab
V100 2 2 x Intel Xeon Gold 5120 CPU @ 2.20GHz 28 cores 384 GB 4xV100 16GB no arcus-htc-gpu[006-007] Torr Vision Group
V100 1 2 x Intel Xeon E5-2698 v4 @ 2.20GHz 40 cores 512 GB 8xV100 16GB yes arcus-htc-dgx1v ARC
V100 2 2 x Intel Xeon Gold 5120 CPU @ 2.20GHz 28 cores 384 GB 4xV100 16GB yes arcus-htc-gpu[024-025] Dell UK
V100 2 2 x Intel Xeon Gold 5120 CPU @ 2.20GHz 28 cores 384 GB 4xV100 32GB yes arcus-htc-gpu[030-031] Torr Vision Group
V100-LS 5 2 x Intel Xeon E5-2698 v4 @ 2.20GHz 40 cores 512 GB 8xV100 LS 32GB yes arcus-htc-maxq[001-005] ARC

Please note: Some machines in the above table have been purchased by specific departments/groups and are hosted by the ARC team (see Stakeholder column for details). These machines are available for general use, but may have job time-limit restrictions and/or occasionally be reserved for exclusive use of the entity that purchased them.

NVidia DGX Max-Q

The new nodes are a version of the NVIDIA Volta DGX-1 32GB V100 Server (offering 8x NVLinked Tesla V100 32GB GPUs) using the slightly lower clock speed V100-SXM2-32GB-LS version of the Volta cards. The systems have 40 CPU cores (E5-2698 v4 @ 2.20GHz CPUs) and 512GB of system memory.

The plots below show typical benchmark results between the DGX1V and DGX Max-Q:

 

typical GROMACS benchmark results between the DGX1V and DGX Max-Qbenchmark results for tensorflow, DGX1V and DGX-MaxQ

 

Storage

Our clusters systems share 2PB of high-performance GPFS storage.

Software

Users may find the application they are interested in running is already been installed on at least one of the systems.  Users are welcome to request the installation of new applications and libraries or updates to already installed applications via our software request form.