Efficiently Running Serial Software

Introduction

Processing workloads often involve serial applications (i.e. single-threaded, single processes), which run on a single core. Jobs like this should where possible not be run on Arcus-B, as Arcus-B will allocate an entire node to the job, which would be wasteful (most of the cores would be idle). Arcus-HTC allows sharing of resources and would only allocate one core to such a job, so Arcus-HTC is the best cluster to use for serial applications.

This guide gives tips on how to design a job script for a serial application. The assumption is the workload involves a large number of similar but independent runs of the same application with different input parameters.

Using job arrays

In the scenario of a workload with a large number of independent runs, packing several runs inside a job works well with job arrays. Using job arrays, a number of jobs can be started with a single submission command and a single submission script. The job array index can be used within the submission script to identify the parameters each process works with (input or output files, command line parameters, etc.).

To give an example, assume serialapp has to run a parameter sweep, with values ranging from 1.01 to 1.96 with a step of 0.0. On Arcus-HTC, all the processing can be carried out by submitting one job with the following job script:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --job-name=test
#SBATCH --time=00:30:00

RUN_PARAM=$(printf "1.%02d\n" $SLURM_ARRAY_TASK_ID)
serialapp $RUN_PARAM

submitted using the command:

sbatch --array=1-96

The variable SLURM_ARRAY_TASK_ID is the job array index (set by SLURM when an array job is submitted) and runs from 1 to 96, in accordance to the submission command above. It could also be used to, for example, run a specific line out of a file, or identify an input file.

On Arcus-B

In the rare case that your application can not run on Arcus-HTC, in order to fully utilise a node and use all cores available, instead of starting a single process, write your job script so it starts a number of processes to run concurrently. There are multiple ways to achieve this; one way is to start the processes are in the background, i.e. detached from the shell in which they were started. Processes started in the background return control to the shell (almost) at once, which means a number of processes can be started (practically) at the same time, without one having to wait for the preceding process to finish. Processes are started in the background execution using the & (ampersand) character.

Another way is to use an application like GNU parallel (https://www.gnu.org/software/parallel/), a shell tool for executing jobs in parallel. GNU parallel is installed on ARC systems.

Example job script backgrouding tasks

Suppose the task is to run an application called serialapp, which takes one numeric command line argument.  Several instances of serialapp can be started in the background with something like this

serialapp 1.01 &
serialapp 1.02 &
...

Assuming serialapp runs for a time that is independent from the value of the argument, all the serialapp processes running concurrently finish in approximately the same time as a single process.

Here is an example of Slurm job that starts a number of instances of serialapp which fills up an entire node:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --job-name=test
#SBATCH --time=00:30:00

serialapp 1.01 &
serialapp 1.02 &
serialapp 1.03 &
serialapp 1.04 &
serialapp 1.05 &
serialapp 1.06 &
serialapp 1.07 &
serialapp 1.08 &
serialapp 1.09 &
serialapp 1.10 &
serialapp 1.11 &
serialapp 1.12 &
serialapp 1.13 &
serialapp 1.14 &
serialapp 1.15 &
serialapp 1.16 &

wait

 

There are several things to notice in the above example:

  • A fixed number of processes are started (in this case, 16, as that maches the number of CPU cores on most Arcus-B nodes). The 16 processes are (normally) scheduled by linux to run on separate cores. If you start more processes than the machine has cores, execution will be very inefficient.
  • The job uses one node and only one; if the user requests two nodes (or more), the scheduler allocates the requested number of nodes but all processes will be started on the first node allocated, leaving the others idle. There is no mechanisms through which processes will "spill over" to the other nodes allocated.
  • Because the processes return control to the shell once started in the background, a wait command is needed right at the end.  Its purpose is to force the job wait for the processes started in the background to finish.  Without this command, the scheduler considers the job finished (almost) immediately after it starts, and the processes are killed prematurely.

Example parallel jobs using GNU parallel

GNU parallel is a shell tool for executing jobs in parallel. To achieve the same run of 'serialapp' using GNU parallel, one could use the following job script:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --job-name=test
#SBATCH --time=00:30:00

 

module load gnu-parallel

 

parallel<<EOF

serialapp 1.01
serialapp 1.02
serialapp 1.03
serialapp 1.04
serialapp 1.05
serialapp 1.06
serialapp 1.07

serialapp 1.08
serialapp 1.09
serialapp 1.10
serialapp 1.11
serialapp 1.12
serialapp 1.13
serialapp 1.14
serialapp 1.15
serialapp 1.16

serialapp 1.17

serialapp 1.18

serialapp 1.19

serialapp 1.20
EOF

 

(this uses a so-called 'here document', which is basically a document created on the fly).

The advantages of using parallel are:

  • GNU parallel will alwasy start as many jobs in parallel as there are cores (provided there are that many tasks), but not oversubscribe cores; if there are more applications to run than cores on the machine, it will run as many as it can, wait for cores to be released, and then automatically start the remaining tasks. So there is no need to 'match' the number of applications started to the number of compute cores available (and it is no problem to run more tasks than cores).
  • There is no need for 'wait' statements; parallel handles all the waiting for applications to finish internally.

It would also be possible to store the commands to run in a separate file, one per line (e.g. command_for_serial) - looking like this -

serialapp 1.01
serialapp 1.02
serialapp 1.03
serialapp 1.04
serialapp 1.05
serialapp 1.06
serialapp 1.07

serialapp 1.08
serialapp 1.09
serialapp 1.10
serialapp 1.11
serialapp 1.12
serialapp 1.13
serialapp 1.14
serialapp 1.15
serialapp 1.16

serialapp 1.17

serialapp 1.18

serialapp 1.19

serialapp 1.20

 

and then have a job script like this

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --job-name=test
#SBATCH --time=00:30:00

 

module load gnu-parallel

 

parallel<commands_for_serial

 

Advanced topics

There are two aspects to be aware of in connection to packing several processes per job in the way described above.

  • Memory restrictions: Ideally, the number of single-threaded processes per job should equal that of the cores available.  If the total amount of memory required by all processes surpasses what is available per node, the linux kernel (the out of memory killer, to be more precise) kills some of these processes.  This will be reflected in error messages in the scheduler output for the job in question.  The solutions are to
    • run the job on a node with more memory available or
    • reduce the number of processes started per job.
  • Placement control: In principle, the time it takes for all processes to execute concurrently matches the time takes by a single process to execute.  (Thus, the advantage of the solution presented in this guide is proportional to the number of cores per node.)  This is not true with some applications, which are scheduled by linux suboptimally on the cores available, leading to resource contention and to performance degradation.  In such a case, control over how the processes are scheduled might help.  This can be achieved via the taskset command.  For example, the forllowing command starts the process serialapp on core 0, and binds it to that core:

taskset -c 0 serialapp $RUN_PARAM