A lot of scientific software, codes or libraries can be parallelized via MPI or OpenMP/Multiprocessing.

Avoid submitting inefficient Jobs!

If your code can be parallelized only paritially (serial parts remaining), familiarize with Amdahl's law and make sure your Job efficiency is still well above 50%.

Default Values

Slurm parameters like --ntasks and --cpus-per-task default to 1 if omitted.

Hybrid MPI+OpenMP Jobs (n×m×p CPUs over n×m Tasks on p Nodes)

Many codes combine multithreading with multinode parallelism using a hybrid OpenMP/MPI approach. Below is a Slurm script appropriate for such a code:

#!/usr/bin/env bash

#SBATCH --job-name=test
#SBATCH --partition=epyc
#SBATCH --mail-type=END,INVALID_DEPEND
#SBATCH --mail-user=noreply@uni-a.de
#SBATCH --time=1-0

# Request memory per CPU
#SBATCH --mem-per-cpu=1G
# Request n CPUs for your task.
#SBATCH --cpus-per-task=n
# Request n tasks per node
#SBATCH --ntasks-per-node=m
# Run on m nodes
#SBATCH --nodes=p

# Load application module here if necessary

# set number of OpenMP threads
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

# No need to pass number of tasks to srun
srun my_program

Make sure your code actually supportes this mode of operation of combined MPI + OpenMP parallelism..

discouraged use of mpirun

The use of mpirun is heavily discouraged when queuing your Job via Slurm.