A lot of scientific software, codes or libraries can be parallelized via MPI or OpenMP/Multiprocessing. Avoid submitting inefficient Jobs! If your code can be parallelized only paritially (serial parts remaining), familiarize with Amdahl's law and make sure your Job efficiency is still well above 50%. Default Values Slurm parameters like --ntasks
and --cpus-per-task
default to 1
if omitted.
If discouraged use of mpirun The use of ProTip: Try to keep the number of nodes as small as possible. If n×m ≤ 128 discouraged use of mpirun The use of Pure MPI Jobs (n tasks)
#!/usr/bin/env bash
#SBATCH --job-name=test
#SBATCH --partition=epyc
#SBATCH --mail-type=END,INVALID_DEPEND
#SBATCH --mail-user=<e-mail address>
#SBATCH --time=1-0
# Request memory per CPU
#SBATCH --mem-per-cpu=1G
# Request n tasks per node
#SBATCH --ntasks=n
# If possible, run all tasks on one node
#SBATCH --nodes=1
# Load application module here if necessary
# No need to pass number of tasks to srun
srun my_program
--nodes=1
is omitted and all cluster nodes are almost full, Slurm might distribute a variable number of tasks on a variable number of nodes. Try to avoid this scenario by always setting a minimal number of nodes via --nodes
.srun
is the Slurm application launcher/job dispatcher for parallel MPI Jobs and (in this case) inherits all the settings from sbatch
. This is the preferred way to start your MPI-parallelized application.mpirun
is heavily discouraged when queuing your Job via Slurm.Pure MPI Jobs (n×m tasks on m nodes)
#!/usr/bin/env bash
#SBATCH --job-name=test
#SBATCH --partition=epyc
#SBATCH --mail-type=END,INVALID_DEPEND
#SBATCH --mail-user=<e-mail address>
#SBATCH --time=1-0
# Request memory per CPU
#SBATCH --mem-per-cpu=1G
# Request n tasks per node
#SBATCH --ntasks-per-node=n
# Run on m nodes
#SBATCH --nodes=m
# Load application module here if necessary
# No need to pass number of tasks to srun
srun my_program
--nodes=1
is always the best choice. This is due to latency of intra-node MPI communication (shared memory) being about two orders of magnitude lower than inter-node MPI communication (Network/Infiniband)mpirun
is heavily discouraged when queuing your Job via Slurm.
For modules provided by the HPC-Team these variables are most lilely already set in the corresponding module definition.Environment variables for different MPI flavors
export I_MPI_PMI_LIBRARY=/hpc/gpfs2/sw/pmi2/current/lib/libpmi2.so
export I_MPI_FABRICS=shm:ofi
export FI_PROVIDER=mlx
export SLURM_MPI_TYPE=pmi2
# or more simply:
module load impi-envvars
export SLURM_MPI_TYPE=pmix_v4 # or pmix_v3 or pmix_v2 depending on what your self-compiled OpenMPI version supports