Flux Framework - Flux in Slurm
Getting started ...
What is it?
Flux Framework is a task scheduling and resource management framework - much like Slurm is. However, it can be run completely in user space. And we describe it here as an alternative to Slurm's srun task farming capabilities.
Flux is rather versatile, but also quite complex - and still under very active development. We must therefore refer to the flux documentation for all the details left out here.
Using LRZ Module
> module use /lrz/sys/share/modules/extfiles/ # for the time we test it > module av flux-core ------------------ /lrz/sys/share/modules/extfiles ------------------------- flux-core/0.63.0 > module load flux-core
Installation
The simplest installation is probably via conda.
> conda create -n my_flux -c conda-forge flux-core flux-sched > conda activate my_flux (my_flux) > flux version commands: 0.59.0 libflux-core: 0.59.0 build-options: +hwloc==2.8.0+zmq==4.3.5
If you need a more up-to-date version of flux, you probably can't get around to build it from source (https://github.com/flux-framework/). But spack may help you to simplify that process.
Another option to install flux-core is Spack (user_spack). However, in order to get the latest version, manual manipulation of the package will be necessary.
Interactive Workflows
Real interactive work with Flux is probably not so reasonable. But for testing purposes, and as sort of a starting point, let's have a short look at it. We start from a login node.
login > conda activate my_flux # activate flux environment (my_flux) login > srun -N 2 -M inter -p cm2_inter --pty flux start # allocate resources (on cluster/partition you want) i22r07c05s05 > flux uptime # basic info about the running flux instance 14:11:57 run 7.9s, owner ⼌⼌⼌⼌⼌⼌⼌, depth 0, size 2 i22r07c05s05 > flux resource info # basic info about the resources managed by the flux instance 2 Nodes, 56 Cores, 0 GPUs i22r07c05s05 > flux run --label-io -N2 hostname # run a task (here, on each node one) 0: i22r07c05s05 1: i22r07c05s08 i22r07c05s05 > flux bulksubmit --output=log.{{id}} -n 1 -c 7 /lrz/sys/tools/placement_test_2021/bin/placement-test.omp_only -t 7 -d 20 ::: $(seq 0 100) ƒCF6D7Bu # flux job IDs [...] i22r07c05s05 > flux jobs -a JOBID USER NAME ST NTASKS NNODES TIME INFO [...] ƒCL2LiaU ⼌⼌⼌⼌⼌⼌⼌ placement+ S 1 - - ƒCGVkRgt ⼌⼌⼌⼌⼌⼌⼌ placement+ R 1 1 8.580s i22r07c05s05 ƒCGVkRgs ⼌⼌⼌⼌⼌⼌⼌ placement+ R 1 1 10.15s i22r07c05s11 ƒCGUGSQa ⼌⼌⼌⼌⼌⼌⼌ placement+ R 1 1 12.45s i22r07c05s11 ƒCGUGSQZ ⼌⼌⼌⼌⼌⼌⼌ placement+ R 1 1 12.45s i22r07c05s11 ƒCGUGSQY ⼌⼌⼌⼌⼌⼌⼌ placement+ R 1 1 12.79s i22r07c05s05 ƒCGUGSQX ⼌⼌⼌⼌⼌⼌⼌ placement+ R 1 1 13.35s i22r07c05s11 ƒCGSnT8C ⼌⼌⼌⼌⼌⼌⼌ placement+ R 1 1 14.15s i22r07c05s05 ƒCGSnT8B ⼌⼌⼌⼌⼌⼌⼌ placement+ R 1 1 17.15s i22r07c05s05 ƒCG62dBP ⼌⼌⼌⼌⼌⼌⼌ placement+ CD 1 1 23.41s i22r07c05s05 ƒCG62dBQ ⼌⼌⼌⼌⼌⼌⼌ placement+ CD 1 1 19.54s i22r07c05s11 ƒCG62dBM ⼌⼌⼌⼌⼌⼌⼌ placement+ CD 1 1 20.68s i22r07c05s11 [...] i22r07c05s05 > exit
flux
has an elaborate direct help system. Please use flux help
and flux help <command>
to acquire some information or reminder.
flux submit/bulksubmit
, flux cancel <job ID>
and flux job -a
can be used similarly to sbatch
, scancel
and squeue
under Slurm. Maybe flux cancelall -f
is a highlight in the first tests.
Non-Interactive Workflows
The far more normal approach to use flux is probably to have a bunch of tasks that should be bundled with a Slurm job. This comprises already the maximum scope of possible workflows, which we cannot cover here at all. But an example should illustrate the basic principle.
With srun
, the flux instances are started (one process per node), and also just with a script – workflow.sh
. This script contains the actual flux workflow description. We use here some dummy programs which provide us with information about the rank/thread-to-cpu placement. It is probably a good idea to check the correctness of that.
This Slurm script is to be submitted as usual via sbatch
.
NB: We tested here Intel MPI, where flux run
works remarkably well concerning the rank/thread placement.
Waitable Jobs
In general, flux submit
would submit a job, and return to the shell. Specfically, in mass jobs within a Slurm job, that would lead to that the workflow-script above would then just return after the submission of the last flux job. To handle this, flux submit
knows the option --flags=waitable
. Together with a subsequent flux job wait --all
, we have a similar idiom like the srun &; wait
for Slurm job farming. However, the flux documentation claims that flux job wait
is much more lightweight than bash wait
.
Dependency Trees
flux submit
also knows job dependencies via --dependency=...
option. Here, ... can for instance be afterok:JOBID
. That is sematically equal to Slurm's sbatch
job dependencies.
After Slurm Job stops
flux seems not to have a job-bookkeeping device. But flux queue
seems to offer some capabilities to document/archive the flux's queue status. Please check the cheat sheet below.
# Stop the queue, wait for running jobs to finish, and dump an archive. flux queue stop flux queue idle flux dump ./archive.tar.gz
In order to execute that in a Slurm job, maybe some bash trap ... EXIT
is necessary (where ... is some cleanup bash function).
The last three topics are maybe easier to answer with some workflow managers like nextflow (assuming that they support flux anyhow).
Further Reading
Flux Framework comes with a vast scope of documentation, user guides and tutorials. We propose to beginners to start with the Learning Guide.
To bind Flux into a Slurm frame, please consult the docu on that.
As a good overview, the Cheat Sheet is of tremendous help.