.

News

ARC Scheduled Downtime: Tues 15th - Wed 16th September 2015

As part of the regular ARC maintenance schedule, we are announcing a 48 hour downtime period for all ARC systems (and systems run by ARC) on Tuesday 15th to Wednesday 16th September 2015 (inclusive). During this downtime period, it will not be possible to log in to ARC systems or run any jobs.

If this scheduled downtime period conflicts with your critical deadline or training course, please get in contact with us as soon as possible at support@arc.ox.ac.uk.

Read our page on Scheduled Downtime for more information.

 

ARC HPC Showcase Event 2016 : 21st April

Following the success of this years ARC HPC Showcase event, planning has started for next years event which will be held on Thursday 21st April 2016.

**Save the Date!!**

 

 

ARC HPC Showcase Event 2015 : 14th April 2pm onwards

Public Event: Open to all, no registration required.

Come and join Advanced Research Computing (ARC) at an open event showcasing work from a variety of research groups, illustrating how HPC can be used to innovate and accelerate research.

The event will be held at the Oxford e-Research Centre, 7 Keble Road, Oxford, OX1 3QG.

The day will also present opportunities for networking and meeting several HPC vendors, and will mark the official launch of the new ARC compute service.

For those attending the ARC Showcase Morning session (invite only), the day starts at 10am.

** Bloodhound Replica car and driving simulator on display from 9am to 4.30pm ** 

On the day we are also supporting the Bloodhound SSC project, to inspire the next generation of HPC developer and HPC users! The Bloodhound full size replica car will be on display outside the OeRC and inside the Bloodhound Driving Experience will also be available so you can have a go at driving at 1000mph. A talk at 3pm will also be given by the Bloodhound Education team (first come first serve basis on the day).

ARC HPC Show15

 

http://www.bloodhoundssc.com/news/arc-high-performance-computing-showcas...

 

 

New ARCUS Phase B Cluster (17 /03 /2015)

The new ARCUS Phase B cluster was delivered to the ARC in early January. Since then we have been working with OCF to install and deploy the new service. A significant milestone was reached on the 2nd March when after several years service the existing production machines HAL and SAL were decommisioned. 

We are currently in early adopter phase to iron out small bugs and implementation issues on the service. Shortly we will be announcing full access for all and officially launching our new service on the 14th April. 

The following links to a copy of the flyer advertising our ARC HPC Showcase event (Launch Day). If you would like a higher quality PDF to print and display locally, to encourage people to attend the open afternoon please let me know. http://bit.ly/aprli14 Please do come along on the day to celebrate HPC in Oxford.

Local and Regional services update (30 Jul 2014)

Funding changes:

First, from the August 1st we will be changing to operate as an SRF type of facility. This affects those users who apply for grant income to cover the cost of compute time on ARC. From the 1st August under SRF guidelines our unit cost will reduce to 2p per core hour. (For reference the current fEC rate under MRF model is 5p per core hour).  We strongly encourage all users who can apply for compute time on grant applications to do so using this new SRF rate.

Service updates:
ARC

Locally we were successful in July for our capital funding bid to provide new compute capacity within Oxford. We are currently working towards having this new service in production by January 2015. Until then all current local services (including HAL and SAL) will remain operational. The new HPC Cluster we expect should bring in somewhere around 4000 cores of new capacity.

In addition to our own investment in a new cluster I'd also like to remind you that those looking to build and host their own HPC clusters to come and talk to us. With our new investment there are opportunities for joint procurement at this time to increase the overall capacity of a new service, and as before we continue to offer the HPC Hosting service. If you are interested please contact us to discuss in more detail.

Regionally we will be withdrawing from the IRIDIS service as of the 30th September 2014 (at the CFI meeting last week pushed this from the 1 September to ensre our users have time to get out. The cost difference is minimal - all things considered). We will remain in Emerald until at least the end of March 2015. Subject to funding and general continuation of the service beyond the end of March we will look to remain engaged until possible July 2015.

SES: CFI (IRIDIS and EMERALD)

Regionally some difficult decisions have had to be made in conjunction with the ARC Executive Committee. The current regional resources, Emerald and Iridis, currently become end of life, next March (2015). Discussions over how best to use what local funding exists have led to the decision to withdraw from Iridis at the end of September 2014, but to remain involved with Emerald until at least March 2015. Though Emerald becomes end of life in March next year, there is interest from all members of the regional consortium to keep Emerald alive as long as possible. If possible we are looking to keep Emerald as a service operational until July 2015.

This means that those current users of Iridis will need to ensure that all data is copied from the service before the end of September 2014.

New development queue on production clusters (25 Jun 2013)

Following discussion at the OSC User Forum in May 2013, a new development queue has been added to the Torque job scheduler on the production cluster systems (hal, sal and arcus).

his queue is for jobs that run for a short time intended to allow users to test that jobs they submit will run correctly.  Currently there are two (2) nodes on both hal and arcus which are reserved for develq jobs, so the turnaround time for jobs submitted to the develq should be shorter than jobs which are in the standard work queues.  The maximum walltime for jobs in the development queue is 10 minutes.

To submit jobs to the develq, please pass the queue name as an option to the qsub command:

qsub -q develq submit-script.sh

If your submission script has a walltime setting in its PBS directives for a walltime greater than 10 minutes, then this walltime can be overridden on the qsub command line.  Eg, if submit-script.sh has a walltime of 96 hours specified as:

#PBS -l walltime=96:00:00

you can still submit the script to the develq by specifying a walltime to the qsub command line:

qsub -q develq -l walltime=00:10:00 submit-script.sh

Torque replaces PBS-Pro (7 March 2013)

We are in the process of switching from the commercial product PBS-Pro to the open source Torque for our job scheduler.  (The job scheduler Torque will also be coupled with the resource management system Maui.)

How does this change affect submission scripts?

Very little.  There are three changes users should be aware of.

First, resource allocation is done through the directives nodes and ppn (processes per node), which replace select and mpiprocs respectively.  For instance, the execution of an 32 way MPI run on 2 compute nodes needs #PBS -l nodes=2:ppn=16 in the Torque submission script.

Second, the submission of job arrays is slightly different in Torque than in PBS-Pro, using different flags, etc.

Finally, although the use of the ncpus directive, familiar to the users of Caribou, continues to be supported, we encourage all users to request resources through the nodes directive.  For example, a Caribou job that needs to run on 8 cores using 100GB of memory needs #BPS -l nodes=1 in the submission script; this will allocate an entire Caribou node on an exclusive-use basis, giving access to 16 cores and 128GB of RAM.  As before, it is up to the application and/or submission script to set the environment for a threaded execution on the correct number of cores, e.g. through setting OMP_NUM_THREADS.

We have prepared a Torque guide, which describes all these in detail.  Job submission script examples are also provided for the more popular applications.

 

Arcus - the new Dell cluster (18 Oct 2012)

Arcus is the name of the Dell Sandybridge cluster that OSC purchased during the last quarter of 2012.  It can be accessed through a remote login to arcus.oerc.ox.ac.uk.  It has three login nodes and 80 compute nodes, each with 16 physical cores.  More details here.

What MPI should I use on Arcus?

Arcus has several MPI library stacks available as modules for general use: OpenMPI, MVAPICH and Intel MPI.  While OpenMPI and MVAPICH were custom-built by the OSC team to use the cluster's Qlogic Infiniband communication fabric, Intel MPI has to be instructed to use Qlogic proprietary PSM layer (passing the extra argument -psm to mpirun).  OpenMPI and MVAPICH are available in several versions, using both the Intel and the GNU compilers.  More information on how to compile and run MPI applications is found here.

What compilers are available on Arcus?

The Intel compilers are available and they generally offer the best performance on Intel CPUs.  Several versions are available as modules.  The GNU compilers are also available for software developed using those compilers, which fails to compile with other compilers.

Users should benefit from the AVX vectorisation, supported by both the Intel compilers and the new versions of the GNU compilers on the Intel Sandybridge CPUs.  This is a new enhancement to the SSE vectorisation (supported by older SPU architectures) and is triggered by using the -mavx flag in both the Intel and GNU compilers.  Compiling code that runs with AVX support on Intel Sandybridge CPUs and with SSE support on older CPUs is also possible with the Intel compilers, for instance using the flags -xsse4.1 -axavx; the manual page for icc or ifort provide the details.

Is HyperThreading turned on on Arcus?

Yes.  Thus, a parallel application can run with a potential increase in perfromance using 32 threads or processes on the 16 available cores.  Users are advised to benchmark their applications to determine whether HyperThreading is beneficial or not for their particular case.  To give two examples, while Gromacs was found to have up to 20% increase in performance from HyperThreading, Gaussian was found to suffer by nearly as much from running with two threads per physical core.