A lot of scientific software, codes or libraries can be parallelized via MPI or OpenMP/Multiprocessing.

Avoid submitting inefficient Jobs!

If your code can be parallelized only partially (serial parts remaining), familiarize with Amdahl's law and make sure your Job efficiency is still well above 50%.

Default Values

Slurm parameters like --ntasks and --cpus-per-task default to 1 if omitted.

However, when omitting these Slurm parameters, their corresponding environment variables SLURM_NTASKS and SLURM_CPUS_PER_TASK will not be populated. For this reason you will find export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1} in most job templates which sets OMP_NUM_THREADS=1 if SLURM_CPUS_PER_TASK is not defined.

Pure MPI Jobs (n tasks)

#!/usr/bin/env bash

#SBATCH --job-name=test
#SBATCH --partition=epyc
#SBATCH --mail-type=END,INVALID_DEPEND
#SBATCH --mail-user=noreply@uni-a.de
#SBATCH --time=1-0

# Request memory per CPU
#SBATCH --mem-per-cpu=1G
# Request n tasks
#SBATCH --ntasks=n
# Run all tasks on one node
#SBATCH --nodes=1

# Load application module here if necessary

# No need to pass number of tasks to srun
srun my_program

If --nodes=1 is omitted and all cluster nodes are almost full, Slurm might distribute the tasks on a variable number of nodes. Try to avoid this scenario by always setting a fixed number or a range of nodes via --nodes=a or --nodes=a-b with a ≤ b.

srun is the Slurm application launcher/job dispatcher for parallel MPI Jobs and (in this case) inherits all the settings from sbatch . This is the preferred way to start your MPI-parallelized application.

discouraged use of mpirun

The use of mpirun is heavily discouraged when queuing your Job via Slurm.

Ensure MPI capability of your application

If your application does not support MPI and and you set --ntasks=n (n >1), then your application is simply started n times needlessly doing the same thing.

Pure MPI Jobs (n×m tasks on m nodes)

#!/usr/bin/env bash

#SBATCH --job-name=test
#SBATCH --partition=epyc
#SBATCH --mail-type=END,INVALID_DEPEND
#SBATCH --mail-user=noreply@uni-a.de
#SBATCH --time=1-0

# Request memory per CPU
#SBATCH --mem-per-cpu=1G
# Request n tasks per node
#SBATCH --ntasks-per-node=n
# Run on m nodes
#SBATCH --nodes=m

# Load application module here if necessary

# No need to pass number of tasks to srun
srun my_program

ProTip: Try to keep the number of nodes as small as possible. If n×m ≤ 128 --nodes=1 is always the best choice. This is due to latency of intra-node MPI communication (shared memory) being about two orders of magnitude lower than inter-node MPI communication (Network/Infiniband)

discouraged use of mpirun

The use of mpirun is heavily discouraged when queuing your Job via Slurm.

Bereichsverknüpfungen

Seitenhierarchie

Pure MPI Jobs (n tasks)

Pure MPI Jobs (n×m tasks on m nodes)

Bereichsverknüpfungen

Seitenhierarchie

Submitting Pure MPI Jobs

Pure MPI Jobs (n tasks)

Pure MPI Jobs (n×m tasks on m nodes)