Resource limits for parallel jobs on Linux Cluster
This subdocument contains a description of constraints under which parallel jobs execute on the cluster systems: maximum run times, maximum memory and other SLURM-imposed parameters.
Resource limits for interactive jobs
Notes:
- Please do not use resources in this partition to run regular production jobs! This partition is meant for testing!
- A given user account cannot run more than one job at a time.
Partition | Core counts and remarks | Run time limit (hours) | Memory limit (GBytes) |
---|---|---|---|
interactive nodes on CooLMUC-2 | Maximum number of nodes in a job: 4 | 2 (default is 15 minutes) | 56 per node |
interactive nodes on CooLMUC-3 | Maximum number of nodes in a job: 3 | 2 (default is 15 minutes) | ~90 DDR per node, plus 16 HBM per node |
Resource Limits for batch jobs
The following is an overview of the resource limits imposed for various classes of jobs. These are comprised of run time limits, limits on core counts for parallel jobs, and memory limits. Please consult the SLURM specifications subdocument for a more detailed explanation of parallel environments, in particular how to correctly specify memory requirements. With respect to run time limits it is recommended to always specify a target run time via the --time switch; this in particular for smaller jobs may allow the scheduler to perform backfilling.
- The designation "shared memory" for parallel jobs assumes that a number of cores assigned by SLURM will be used by threads; typically a command like export OMP_NUM_THREADS=<number> should be issued to achieve this.
- The designation "distributed memory" for parallel jobs assumes that MPI is used to start one single-threaded MPI task per core assigned by SLURM. In principle it is also possible to run hybrid MPI + threaded programs, in which case the number of cores assigned by the system will be equal to the product (# of MPI tasks) * (# of threads), rounded up if necessary.
Job Type | SLURM Cluster | SLURM Partition | Node range | Run time limit (hours) | Memory limit (GByte) |
---|---|---|---|---|---|
CoolMUC-2: 28-way Haswell-EP nodes with Infiniband FDR14 interconnect and 2 hardware threads per physical core (see also example job scripts) | |||||
Small distributed memory parallel (MPI) job | --clusters=cm2_tiny | --partition=cm2_tiny | 1-4 | 72 | 56 per node |
Standard distributed memory parallel (MPI) job | --clusters=cm2 | --partition=cm2_std | 3-24 | 72 | 56 per node |
Large distributed memory parallel (MPI) job | --clusters=cm2 | --partition=cm2_large | 25-64 | 48 | 56 per node |
Shared memory parallel job | --clusters=cm2_tiny | --partition=cm2_tiny | 1 | 72 | 56 |
CoolMUC-3: 64-way Knight's Landing 7210F nodes with Intel Omnipath 100 interconnect and 4 hardware threads per physical core (see also example job scripts) | |||||
Distributed memory parallel job | --clusters=mpp3 | --partition=mpp3_batch (optional) | 1-32 | 48 | ~90 DDR per node, plus 16 HBM per node |
Teramem: HP DL580 shared memory system (see also example job scripts) | |||||
Shared memory thread-parallel job | --clusters=inter | Specify --partition=teramem_inter as well as the number of cores needed by the executable(s) to be started. | 1 (up to 64 logical cores) | 48 (default 8) | ~60 per physical core (each physical core has 2 hyperthreads) |
If a job appears to not use resources properly, it will be terminated at LRZ staff's or surveillance system's discretion.
Resource limits on housed systems
The clusters and partitions listed in this section are only available for institutes that have a housing contract with LRZ.
Job Type | Architecture | Core counts and remarks | Run time limit (hours) | Memory limit (GByte) |
---|---|---|---|---|
Distributed memory parallel (MPI) jobs | 28-way Haswell-EP nodes with Infiniband FDR14 interconnect | Please specify the cluster --clusters=tum_chem and one of the partitions --partition=[tum_chem_batch, tum_chem_test] Up to 392 core jobs are possible (56 in the test queue). Dedicated to TUM Chemistry. | 384 (test queue: 12) | 2 per task (in MPP mode, using 1 physical core/task) |
Distributed memory parallel (MPI) jobs | 28-way Haswell-EP nodes with Infiniband FDR14 interconnect | Please specify the cluster --clusters=hm_mech Up to 336 core jobs are possible (if hyperthreading is exploited, double that number) Dedicated to Hochschule München Mechatronics | 336 | 18 per task (in MPP mode, using 1 physical core/task) |
Serial or shared memory jobs | 28-way Haswell-EP nodes with Ethernet interconnect | Please specify the cluster --clusters=tum_geodesy Dedicated to TUM Geodesy | 240 | 2 per task / 60 per node |
Shared memory parallel job | Intel- or AMD-based shared memory systems | Please specify the cluster --clusters=myri as well as one of the partitions --partition=myri_[p,u] Dedicated to TUM Mathematics | 144 | 3.9 per core |
Details on Policies
Policies for interactive jobs
Limitations
On login shells, parallel programs should not be started directly. Please always use the salloc command to initialize a time-limited interactive parallel environment. Note that the shell initialized by the salloc command will still run on the login node, but executables started with srun (or mpiexec) will be started up on the interactive partitioned which was assigned.
Policies for queued batch jobs
General restrictions
- The job name should not exceed 10 characters. If no job name is specified, please do not use excessively long script names.
- Do not use the xargs command to generate command line arguments at submission time. Instead, generate any necessary arguments inside your script.
Scheduling
- For parallel jobs, it is recommended to explicitly specify the run time limit. This may shorten the waiting time, since the job might be run in backfill mode (in other words: use resources that are free while the scheduler tries to fit another large job into the system). Your specification gives the scheduler the information required to organize this.
Jobs in Hold
Jobs in user hold will be removed at the LRZ administrators' discretion if older than 8 weeks.
Job Submissions
- Submission of large numbers of jobs (>100, including array jobs) with very short run time (< 1min) is considered a misuse of resources. It causes both waste of computational resources and - if mail notifications are used - disruption of the notification system. Users that submit such jobs will be banned from further use of the batch system. Bundle the individual jobs into a much bigger one!
- There are maximum numbers of jobs that can be submitted by a user. These limits are different for each cluster and may change over time, depending on the cluster load.
Cluster | Limit on job submission | Limit on running jobs |
---|---|---|
inter | 2 | 1 |
rvs | 1 | 1 |
mpp3 | unlimited | 50 |
serial | 250 | 100 |
cm2_tiny | 50 | 10 |
cm2 | 50 | 4 / 2 * |
*4 jobs on cm2_std, 2 jobs on cm2_large
Memory use
- Jobs exceeding the physical memory available on the selected node(s) will be removed, either by SLURM itself, or the OOM ("out of memory") killer in the operating system kernel, or at LRZ's discretion since such a usage typically has a negative impact on system stability.
Limits on queued jobs
- In order to prevent monopolization of the clusters by a single user, a limit of 50 queued jobs is imposed on both CooLMUC-2 and CooLMUC-3 These limits may change over time, dependent on the cluster load.
Software licenses
- Many commercial software packages have been licensed for usage on the cluster; most of these require the use of so-called floating licenses, only a limited amount of which are typically available. Since it is not possible to check whether a license is available before a batch job starts, LRZ cannot provide any guarantees that a batch job requesting use of such a license will run successfully.