To ensure fair usage of cluster resources, each Slurm partition has a QoS (Quality of Service) that is set for each cluster partition at HPC project (account) level and limits the cluster resources available under the slurm partition.

(MaxCpuPerAccount) - the number of cores requested by the entire HPC-Project currently exceeds this limit;

(MaxJobsPerAccount) - the number of Jobs submitted by the entire HPC-Project currently exceeds this limit;

(MaxGRESPerAccount) - the amount/number of other resources (for example GPUs) requested by the entire HPC-Project currently exceeds this limit.

Why does the slurm squeue command show (QOSMaxJobsPerUserLimit) next to the submitted job?

To ensure fair usage of cluster resources, each Slurm partition has a QoS (Quality of Service) that is set for each cluster partition at HPC project (account) level and limits the cluster resources available under the slurm partition.

(QOSMaxJobsPerUserLimit) - the number of Jobs submitted by the user currently exceeds this limit;

Why does the slurm squeue command show (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) next to the submitted job?

As the message states Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions.

In most cases, the “Reason” field in the last line of the command

scontrol show node <nodename>

output provides additional information about the status of the node.
For example:
# scontrol show node licca047 NodeName=licca047 Arch=x86_64 CoresPerSocket=64 CPUAlloc=0 CPUEfctv=128 CPUTot=128 CPULoad=0.08 AvailableFeatures=Epyc-7713 ActiveFeatures=Epyc-7713 Gres=gpu:a100:3(S:1) NodeAddr=licca-e-047 NodeHostName=licca047 Version=25.05.0 OS=Linux 6.8.0-54-generic #56-Ubuntu SMP PREEMPT_DYNAMIC Sat Feb 8 00:37:57 UTC 2025 RealMemory=1021000 AllocMem=0 FreeMem=1020914 Sockets=2 Boards=1 State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=epyc-gpu-test BootTime=2025-09-12T13:39:18 SlurmdStartTime=2025-09-12T13:43:30 LastBusyTime=2025-09-11T20:35:41 ResumeAfterTime=None CfgTRES=cpu=128,mem=1021000M,billing=138,gres/gpu=3,gres/gpu:a100=3 AllocTRES= CurrentWatts=0 AveWatts=0 Reason=Reserved for workshop [root@2025-09-12T13:04:18

[ How do I register myself to use the HPC resources? ] [ How do I get access to the LiCCA or ALCC resources? ] [ What kind of resources are available on LiCCA? ] [ What kind of resources are available on ALCC? ] [ How do I acknowledge the usage of HPC resources on LiCCA in publications? ] [ How do I acknowledge the usage of HPC resources on ALCC in publications? ] [ What Slurm Partitions (Queues) are available on LiCCA? ] [ What Slurm Partitions (Queues) are available on ALCC? ] [ What is Slurm? ] [ How do I use Slurm batch system ] [ How do I submit the serial calculations? ] [ How do I run multithreaded calculations? ] [ How do I run parallel calculations on several nodes? ] [ How do I run GPU based calculations? ] [ How do I check Slurm current schedule, queue? ] [ Is there some kind of Remote Desktop for the cluster? ] [ If I have a question which is not listed here? ] [ If I want to report a problem? ] [ Which version of Python could be used? ] [ Which Anaconda, Miniconda, Miniforge, Micromamba? ] [ How do I monitor live CPU/GPU/memory/disk utilization? ] [ How do I check my GPFS filesystem usage and quota situation? ] [ Why does the slurm squeue command show (MaxCpuPerAccount), (MaxJobsPerAccount) or (MaxGRESPerAccount) next to the submitted job? ] [ Why does the slurm squeue command show (QOSMaxJobsPerUserLimit) next to the submitted job? ] [ Why does the slurm squeue command show (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) next to the submitted job? ]

Bereichsverknüpfungen

Seitenhierarchie

How do I register myself to use the HPC resources?

How do I get access to the LiCCA or ALCC resources?

What kind of resources are available on LiCCA?

What kind of resources are available on ALCC?

How do I acknowledge the usage of HPC resources on LiCCA in publications?

How do I acknowledge the usage of HPC resources on ALCC in publications?

What Slurm Partitions (Queues) are available on LiCCA?

What Slurm Partitions (Queues) are available on ALCC?

What is Slurm?

How do I use Slurm batch system

How do I submit the serial calculations?

How do I run multithreaded calculations?

How do I run parallel calculations on several nodes?

How do I run GPU based calculations?

How do I check Slurm current schedule, queue?

Is there some kind of Remote Desktop for the cluster?

If I have a question which is not listed here?

If I want to report a problem?

Which version of Python could be used?

Which Anaconda, Miniconda, Miniforge, Micromamba?

How do I monitor live CPU/GPU/memory/disk utilization?

How do I check my GPFS filesystem usage and quota situation?

Why does the slurm squeue command show (MaxCpuPerAccount), (MaxJobsPerAccount) or (MaxGRESPerAccount) next to the submitted job?

Why does the slurm squeue command show (QOSMaxJobsPerUserLimit) next to the submitted job?

Why does the slurm squeue command show (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) next to the submitted job?

Bereichsverknüpfungen

Seitenhierarchie

FAQ and Troubleshooting - LiCCA

How do I register myself to use the HPC resources?

How do I get access to the LiCCA or ALCC resources?

What kind of resources are available on LiCCA?

What kind of resources are available on ALCC?

How do I acknowledge the usage of HPC resources on LiCCA in publications?

How do I acknowledge the usage of HPC resources on ALCC in publications?

What Slurm Partitions (Queues) are available on LiCCA?

What Slurm Partitions (Queues) are available on ALCC?

What is Slurm?

How do I use Slurm batch system

How do I submit the serial calculations?

How do I run multithreaded calculations?

How do I run parallel calculations on several nodes?

How do I run GPU based calculations?

How do I check Slurm current schedule, queue?

Is there some kind of Remote Desktop for the cluster?

If I have a question which is not listed here?

If I want to report a problem?

Which version of Python could be used?

Which Anaconda, Miniconda, Miniforge, Micromamba?

How do I monitor live CPU/GPU/memory/disk utilization?

How do I check my GPFS filesystem usage and quota situation?

Why does the slurm squeue command show (MaxCpuPerAccount), (MaxJobsPerAccount) or (MaxGRESPerAccount) next to the submitted job?

Why does the slurm squeue command show (QOSMaxJobsPerUserLimit) next to the submitted job?

Why does the slurm squeue command show (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) next to the submitted job?