Job Farming with GNU parallel
What is GNU parallel
"GNU parallel is a shell tool for executing jobs in parallel using one or more computers." (documentation)
General Considerations
As with all job farming concepts, also here some statistics about the tasks' requirement (runtime, # cpus, memory) are required in order to be able to assess the total farming job's requirements. And it is the determined goal of jobfarming to obtain a throughput of a high rate. But 100% is most probably not possible. These remaining cases, that failed (for one or the other reason) must be handled then separately.
GNU parallel allows for a lot a input options. Please consult the documentation, specifically the tutorial.
Currently, GNU plot does not allow directly for task dependencies. For those, other tools or some hand-made solution must be found.
Slurm and GNU parallel Example
There are many examples given on the documentation page of GNU parallel. We just illustrate some very common case.
A very simple mode of operation could be as follows.
The Slurm SBATCH header is kept quite general ... number of nodes, and number of srun tasks that can run in parallel. The actual resource requirement per task is specified as srun
parameter list.
The bash function task
performs the task's actual workflow. A task is defined here as execution of placement-test
(just a dummy and test program; replace it by your program) with different input parameters – here given by the bash array PARAMS
. Instead of task
, also a bash script or any other program is conceivable. The first parameter it here expects is the task sequence number ({#}
). We use this to index the tasks with an ID, and parameterize the log files. But other uses are conceivable. PARAMS can contain e.g. also working directories, that can be given to srun (via -D
option). Maybe it is more reasonable to use readarray
to read in PARAMS
from a file. But GNU parallel also allows for direct file input.
This all allows for a rather flexible setup of general job or task farming concepts. However, in order this to work, the correct resource requirements for each task must be specified. We cannot give here all possible variants as an example, because also MPI programs, or hybrid MPI-OpenMP programs can be run as tasks. To obtain the correct placement of the ranks and threads to the CPUs, we recommend to start with dummies (like placement-test
), or use the --verbose
option of srun
, in order to see the CPU masks explicitely. Using sacct
, the job steps created by srun
can be scrutinized for where and when they ran, and for how long, and how much memory was used (peak). To get some confidence on how this works, we recommend to play with some dummy tasks on login nodes (then without srun), or the interactive queue. GNU parallel also has a --
dryrun
option, for checking that the workflow is at least in principle correct. Still, expect that other things can go wrong!
Although GNU parallel can in principle spawn tasks to nodes also directly via SSH, we discourage this use. Slurm is the default resource manager and scheduler on our systems. Using srun
as illustrated is therefore strongly recommended.
Nice features of GNU parallel
GNU parallel has some options like --eta
, for getting some estimated time to completion (very rough), and --joblog
, --resume
, --resume-failed
, and --retry-failed
. The latter allow for a simple but effective way to do bookkeeping of the tasks. In order that this works, task
must return some success (0) or failure (<>0) code. Possible using set -e
inside task could be a good idea. The resume flags are great for cases that a farming job failed (e.g. node failed) or simply timed out (Slurm job wall clock limits are given on out queues). One can simply restart this job by just adding one of these flags.
GNU plot also know the option --results
, which can be given a name which is a directory (the tasks then must be unique), or a file e.g. when ending with .csv or .json. These then contain the stdout, stderr and task sequence number.
Task Dependency Graph
The semantics here is that some tasks may depend on the successfull execution of one or more former tasks. Such task dependencies can be illustrated in a task dependency graph (or, in most cases it is just a tree). For complicated task dependencies, there are workflow tools available (each with its weaknesses and strengths - please also consider radical cybertools).
However for simple task dependency chains, GNU parallel can be used as a simple task workflow tool. The rough idea is that tasks are placed into a file. Some of these tasks write on successful completion subsequent task(s) into the same file. Using tail -f
(essentially), these updates of the file are fed to GNU parallel.
Another option is a combination of writing to the task file (appending tasks) and restart parallel with --joblog
and --resume
. For instance.
Important to know is that for CoolMUC-2, 28 CPUs per node are available (without hyperthreading). On other nodes, the resources are different, and the placement of tasks to these resources must be adapted.
Another possibility could look as follows.
The output of each task is written to the output.csv file. A restart via --resume
will overwrite this file. So, caution!
The stop_monitor
just surveils the yet running srun processes. Without this, tail
would stay running (hanging), even if there is no entry in the task database anymore left.