An extensive documentation of SLURM job handling can be found on the LRZ webpages:
https://doku.lrz.de/job-processing-on-the-linux-cluster-10745970.html
https://doku.lrz.de/display/PUBLIC/SLURM+Workload+Manager
https://doku.lrz.de/display/PUBLIC/Job+Processing+with+SLURM+on+SuperMUC-NG
The jobs must be submitted from CoolMUC-4. First, make sure you setup 2FA and access the cluster (instructions at Access to the cluster). Then it is possible to submit jobs using a SLURM script. Here are some general instructions and an example script.
Partitions
LCG-C2PAP
As an ad-interim solution, the current partition to be used is lcg_c2pap. Each node is set on exclusive use, therefore make sure to allocate all the processors in a node to take full advantage of resources.
CM4
C2PAP users are also entitled to access LRZ Linux Cluster CoolMUC-4 computing nodes. The available partitions, and how to select the one that most suits your needs, are described in detail here: https://doku.lrz.de/job-processing-on-the-linux-cluster-10745970.html
In particular, it is worth noting that on CoolMUC-4 it is possible to test jobs interactively, using the partition cm4_inter. The partition has a 2 hours limit, but it can be very useful to test out module loading, code compilation, and your code configuration. The interactive queue is generally relatively empty and should allocate resources quite fast.
See here for how to check the live usage of partitions.
Interactive jobs
The following command requests an allocation on the interactive queue, using, for example, 2 MPI ranks:
salloc -M inter -p cm4_inter -n 2 -c 1 -t 00:30:00
If you would like to use also threads with OpenMP just set -c <num_of_threads>. The maximum time you can use is 2 hours, so it is only for testing purposes.
This will log you in on one of the available nodes, where you will be able to navigate the directory structure in the same way you would in the login node, as well as load modules, compile and run interactively.
Batch jobs options
Jobs can be submitted to the queue using
sbatch script.slurm
Email notifications
You are required to set an email on the SLURM job, as per LRZ regulations.
#SBATCH --mail-type=END#SBATCH --mail-user=insert_your_email_here
Mail-type: NONE, BEGIN, END, FAIL, REQUEUE
Resources
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=8
This will allocate 1 node, using 2 MPI Tasks from the node and each MPI Task will have then 8 CPU to be used with OpenMP or another shared memory library.
Partition selection
For lcg_c2pap partition:
#SBATCH --clusters=lcg
#SBATCH --partition=lcg_c2pap
Example for CoolMUC-4:
#SBATCH --clusters=cm4
#SBATCH --partition=cm4_tinyExample MPI/OpenMP hybrid job
#!/bin/bash
#SBATCH -J <run_name>
#SBATCH -o ./%x.%j.out
#SBATCH -e ./%x.%j.err
#SBATCH -D <run_directory>
#SBATCH --get-user-env
#SBATCH --export=NONE
#SBATCH --clusters=lcg
#SBATCH --partition=lcg_c2pap
#SBATCH --account=users #<--- only necessary for lcg_c2pap; remove it for CM4.
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=<MPIranks>
#SBATCH --cpus-per-task=<OMPthreads>
#SBATCH --time=24:00:00
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=<your_email>
module load slurm_setup
# Load your modules here
# On lcg_c2pap partition, the software stack is slightly different from CoolMUC4, hence code needs to be compiled in-node. For example:
cd /path/to/code/directory
make clean
make -j
cd $SLURM_SUBMIT_DIR
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpiexec -n ${SLURM_NTASKS} <path/to/executable>Suggested module stack on LCG partition
To load a configuration of Intel compilers + Intel MPI:
ml spack/22.2.1
ml intel
ml intel-mpi
Additional modules that were tested to be working on top of the previous:
ml hdf5/1.10.7-intel21-impi
ml gsl/2.7-intel21
ml fftw/3.3.10-intel21-impi-openmp
Suggested module stack on CoolMUC-4 partitions
GCC
ml spack/23.1.0
ml gcc/12.2.0
ml intel-mpi/2021.9.0
Additional tested modules:
ml gsl/2.7.1-gcc12
ml fftw/3.3.10-gcc12-impi-openmp
ml hdf5/1.8.23-gcc12
Intel
ml spack/24.4.0
ml intel/2024.1.0
ml intel-mpi/2021.12.0
Additional tested modules:
ml fftw/3.3.10-intel24-impi-openmp
ml gsl/2.6-intel24
ml hdf5/1.14.5-intel24-impi