Page Content

Scope

This page summarizes Linux Cluster segments, their specifications as well as according basic job processing settings.

Basic Rules of Job Processing on the Linux Cluster

Choose the compute resources of the Linux Cluster with great care! Although the CoolMUC-4 nodes have many more CPU cores than the old CoolMUC-2 system, the total number of nodes has been reduced significantly!
Select the cluster segments which fit your needs! Learn more in our Guidelines for Resource Selection.
Do not waste/misuse resources! For example, If you intend to run a job on a single CPU core only, then the "serial" cluster segment is your choice. Running such a job on the "cm4_tiny" partition would block an entire compute node. 111 out of 112 available cores would do nothing!
Important Note: Any jobs requesting less then 56 CPU cores should be directed into the serial queue. cm4_tiny/std queues are reserved for consequently and efficiently parallelized workloads.
Linux Cluster system administrators are monitoring the CoolMUC-4 system! If resources are used inappropriately, we get in touch with users for further consulting on the issue.
Please do not run compute jobs on the login nodes! Read more here: Usage Policy on Login Nodes
In recurring cases of misuse, access to the HPC system might be blocked!

Do you need help or do you fear misusing the HPC resources?

Just get in touch with us for further consulting via Linux Cluster request at LRZ Servicedesk or via HPC Lounge.

Step 1: Get the resource which fits my needs aka "jobs that the Slurm scheduler likes"

In order to allocate the optimal resource for your job, you might consider different job types:

standard distributed memory parallel job, e.g. using MPI
shared-memory parallel job on a single node, e.g. using OpenMP
single-core (serial) job
single-node parallel job requiring a fraction of both the available CPU cores and memory
single-node parallel job requiring a fraction of the available CPU cores but needing almost all memory of a regular compute node on CoolMUC-4
large-memory job

The following decision matrix provides assistance in making the right choice.

Decision matrix to select the appropriate partition
How many CPU resources does my job need?	How much memory does my job need?
How many CPU resources does my job need?	up to 488 GB per node	489 - 1000 GB per node	up to 6000 GB per node
more than 4 compute nodes	SuperMUC-NG at LRZ (How to apply for a project?) https://hpc.fau.de
1 - 4 compute nodes	cm4_std
more than 56 CPU cores and max. 1 compute node	cm4_tiny	teramem_inter	teramem_inter
1 - 56 CPU cores	serial_std serial_long	serial_std serial_long	teramem_inter

Step 2: Based on my choice, what job specifications do I have to set?

In the following, we list the appropriate cluster- and partition-specific Slurm settings which need to be specified in your job. For full job script examples, please refer to Running parallel jobs on the Linux-Cluster and Running serial jobs on the Linux-Cluster.

Cluster "cm4"
Partition name	Slurm job settings
cm4_std	--clusters=cm4 --partition=cm4_std --qos=cm4_std
cm4_tiny	--clusters=cm4 --partition=cm4_tiny --qos=cm4_tiny

Cluster "serial"
Partition name	Slurm job settings
serial_std	--clusters=serial --partition=serial_std --mem=<memory_per_node_GB>G # e.g.: --mem=100G
serial_long	--clusters=serial --partition=serial_long --mem=<memory_per_node_GB>G # e.g.: --mem=100G

Cluster "inter"
Partition name	Slurm job settings
cm4_inter	--clusters=inter --partition=cm4_inter
teramem_inter	--clusters=inter --partition=teramem_inter --mem=<memory_per_node_GB>G # e.g.: --mem=100G

Step 3: Check further specifications and limits of clusters and partitions

Cluster specifications					Limits						Node usage
Slurm cluster segment	Slurm partition	Compute nodes in partition	CPU cores per node	GPUs per node	Node range per job min - max	Minimum CPU limit (physical cores)	Maximum CPU limit (physical cores)	Maximum job runtime (hours)	Maximum running (submitted) jobs per user	Memory limit	Node usage
Cluster system CoolMUC-4: Sapphire Rapids (Intel(R) Xeon(R) Platinum 8480+) nodes
cm4	cm4_std	100 (overlapping partitions)	112 (physical) 224 (logical)	--	2 - 4	112 per job	448 per job	24	2 (25)	488 GiB per node	jobs get compute nodes exclusively
cm4	cm4_tiny	100 (overlapping partitions)		--	1 - 1	56 per job	112 per job	24	4 (25)
inter	cm4_inter	6		--	1 - 4	1 per job	112 per job	2	1 (2)
Cluster system CoolMUC-4: Ice Lake (Intel(R) Xeon(R) Platinum 8380) nodes
serial	serial_std	5	80 (physical) 160 (logical)	--	1 - 1	1	56 in sum over all jobs on serial cluster (see remarks below)	48	20 (200)	1000 GiB per node	jobs / users share compute nodes
serial	serial_long	1	80 (physical) 160 (logical)	--	1 - 1	1		168	20 (200)	1000 GiB per node	jobs / users share compute nodes
Cluster system Teramem: single-node shared-memory system (Intel Xeon Platinum 8360HL), 6 TB memory
inter	teramem_inter	1	96 (physical) 192 (logical)	--	1 - 1	1 per job	96 per job	240	1 (1)	approx. 60 GiB per physical core available	jobs (users) share the Teramem node

Remarks on CPU limit

CPU limit refers to the number of available hardware (physical) cores. When setting up your Slurm jobs, please consider the following characteristics and restrictions:

Jobs may incorporate all logical cores, i.e., the number of physical cores times number of threads that can simultaneously run on one core via hyperthreading. In case of CoolMUC-4, this is twice the number of physical cores. Shared-memory jobs (e.g. using OpenMP) or hybrid jobs (e.g. using MPI + OpenMP) may benefit from that. Please also refer to our Slurm job examples.
The total number of requested cores, i.e. the product of "tasks per node" (e.g. MPI processes) and "CPUs per tasks" (e.g. OpenMP threads),...
- Must not exceed the maximum number of cores which is available or allowed to be used per compute node on a particular partition!
- Must not be smaller than the minimum number of cores allowed to be used on a compute node!
Important: Due to limited hardware resources in the serial cluster, the following restriction applies: The total number of requested CPU cores by a user on the entire serial cluster, which are in use at the same time in running jobs, must not exceed 56! For example, you may run a single job with 56 cores or 4 jobs with 10 cores + 1 job with 16 cores. Further jobs can be submitted but will have to wait.

The terms "tasks per node" and "CPUs per tasks" refer to the according Slurm specifications "--tasks-per-node" and "--cpus-per-task". Please refer to our Slurm documentation and CoolMUC-4 job script examples.

Remarks on Memory limit

Keep in mind that the default memory per job on the serial cluster scales with the number of allocated CPU cores. For example, if the job requests 50 percent of the CPU cores, it also gets 50 percent of the memory. If more memory is required, this has to be defined in the job via the "–mem" option.

Remarks on general usage of the partitions

Please note:

Partitions cm4_std, cm4_tiny, serial_std and serial_long are only intended for batch jobs (via sbatch command, see below)!
The partition cm4_inter is intended for interactive jobs (via salloc command, see below). Due to the short job runtime, this partition is suitable for small test jobs.
Both batch jobs and interactive jobs can be run on the partition teramem_inter.

Common/exemplary Slurm commands on the Linux Cluster for job submission and job management

Submit a batch job:

sbatch <MY_SLURM_JOB_SCRIPT>

Start an interactive job, for example allocate job on cm4_inter for 30 minutes:

# Allocate 1 CPU core on a shared node of the interactive queue of CM4:
salloc --clusters=inter --partition=cm4_inter -n 1 --mem=10G -t 00:30:00
#
# Allocate for testing purposes 5 CPU cores on a shared node of the interactive queue of CM4:
salloc --clusters=inter --partition=cm4_inter -n 5 --mem=50G -t 00:30:00
#
# Allocate 1 full CM4 node with 112 CPU cores in the interactive queue of CM4:
salloc --clusters=inter --partition=cm4_inter -N 1 -t 00:30:00

Show my running/waiting jobs, for example on partition cm4_tiny:

# "-M" can be used instead of "--clusters=", same for "-p" vs. "--partition="
# squeue -M <CLUSTER-NAME> -p <PARTITION-NAME> -u $USER

squeue -M cm4 -p cm4_tiny -u $USER

Squeue can provide a lot of information on jobs, e.g. get the estimated (!) start time and state of my jobs:

squeue -M <CLUSTER-NAME> -p <PARTITION-NAME> -j <job-id> -O "jobid,state,priority,starttime,reason"

Cancel a single job or a space-separated list of jobs via job IDs

scancel -M <CLUSTER-NAME> <JOB-ID>

Show details on running, waiting and finished jobs

# compact output of jobs started after a certain date
sacct -M <CLUSTER-NAME> -r <PARTITION-NAME> -X -u $USER --starttime=2024-12-02T00:00:01

# some more details, e.g.: resources, start time, run time, max. memory consumption (KB), job state, reason, exit code, list of allocated nodes
sacct -M <CLUSTER-NAME> -r <PARTITION-NAME> –o jobid,nnodes,ntasks,start,elapsed,maxrss,state,reason,exitcode,nodelist

Learn more here

SLURM Workload Manger documentation (commands and links to examples).