- How do I register myself to use the HPC resources?
- How do I get access to the LiCCA or ALCC resources?
- What kind of resources are available on LiCCA?
- What kind of resources are available on ALCC?
- How do I acknowledge the usage of HPC resources on LiCCA in publications?
- How do I acknowledge the usage of HPC resources on ALCC in publications?
- What Slurm Partitions (Queues) are available on LiCCA?
- What Slurm Partitions (Queues) are available on ALCC?
- What is Slurm?
- How do I use Slurm batch system
- How do I submit the serial calculations?
- How do I run multithreaded calculations?
- How do I run parallel calculations on several nodes?
- How do I run GPU based calculations?
- How do I check Slurm current schedule, queue?
- Is there some kind of Remote Desktop for the cluster?
- If I have a question which is not listed here?
- If I want to report a problem?
- Which version of Python could be used?
- Which Anaconda, Miniconda, Miniforge, Micromamba?
- How do I monitor live CPU/GPU/memory/disk utilization?
- How do I check my GPFS filesystem usage and quota situation?
- Why does the slurm squeue command show (MaxCpuPerAccount), (MaxJobsPerAccount) or (MaxGRESPerAccount) next to the submitted job?
- Why does the slurm squeue command show (QOSMaxJobsPerUserLimit) next to the submitted job?
- Why does the slurm squeue command show (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) next to the submitted job?
How do I register myself to use the HPC resources?
Please consult HPC Project Membership (HPC-Zugriff).
How do I get access to the LiCCA or ALCC resources?
Please consult HPC Project Membership (HPC-Zugriff).
What kind of resources are available on LiCCA?
Please consult Cluster overview page.
What kind of resources are available on ALCC?
Please consult Cluster overview page.
How do I acknowledge the usage of HPC resources on LiCCA in publications?
Please consult Acknowledgement.
How do I acknowledge the usage of HPC resources on ALCC in publications?
Please consult Acknowledgement - ALCC.
What Slurm Partitions (Queues) are available on LiCCA?
See Slurm Queues.
What Slurm Partitions (Queues) are available on ALCC?
See Slurm Queues.
What is Slurm?
Slurm stands for Simple Linux Utility for Resource Management.
How to use Slurm batch system at University of Augsburg HPC facility please consult Submitting Jobs (Slurm Batch System).
The official documentation can be found at https://slurm.schedmd.com/documentation.html.
How do I use Slurm batch system
Please consult the simplified user manual at Slurm 101, and also Submitting Jobs (Slurm Batch System).
How do I submit the serial calculations?
How do I run multithreaded calculations?
See Submitting Multithreaded Jobs.
How do I run parallel calculations on several nodes?
See Submitting Parallel Jobs (MPI/OpenMP).
How do I run GPU based calculations?
See Submitting GPU Jobs.
How do I check Slurm current schedule, queue?
See Slurm 101.
Is there some kind of Remote Desktop for the cluster?
Please consult Connect to the Cluster.
If I have a question which is not listed here?
Please consult Service desk.
If I want to report a problem?
Please consult Service desk.
Which version of Python could be used?
Please consult Python and conda package management.
Which Anaconda, Miniconda, Miniforge, Micromamba?
Please consult Python and conda package management.
How do I monitor live CPU/GPU/memory/disk utilization?
Please consult Live resource utilization monitoring.
How do I check my GPFS filesystem usage and quota situation?
Please consult "Quota regulations and management" under Parallel File System (GPFS).
Why does the slurm squeue command show (MaxCpuPerAccount), (MaxJobsPerAccount) or (MaxGRESPerAccount) next to the submitted job?
To ensure fair usage of cluster resources, each Slurm partition has a QoS (Quality of Service) that is set for each cluster partition at HPC project (account) level and limits the cluster resources available under the slurm partition.
(MaxCpuPerAccount) - the number of cores requested by the entire HPC-Project currently exceeds this limit;
(MaxJobsPerAccount) - the number of Jobs submitted by the entire HPC-Project currently exceeds this limit;
(MaxGRESPerAccount) - the amount/number of other resources (for example GPUs) requested by the entire HPC-Project currently exceeds this limit.
Why does the slurm squeue command show (QOSMaxJobsPerUserLimit) next to the submitted job?
To ensure fair usage of cluster resources, each Slurm partition has a QoS (Quality of Service) that is set for each cluster partition at HPC project (account) level and limits the cluster resources available under the slurm partition.
(QOSMaxJobsPerUserLimit) - the number of Jobs submitted by the user currently exceeds this limit;
Why does the slurm squeue command show (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions) next to the submitted job?
As the message states Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions.
In most cases, the “Reason” field in the last line of the command
scontrol show node <nodename>
output provides additional information about the status of the node.
For example:# scontrol show node licca047
NodeName=licca047 Arch=x86_64 CoresPerSocket=64
CPUAlloc=0 CPUEfctv=128 CPUTot=128 CPULoad=0.08
AvailableFeatures=Epyc-7713
ActiveFeatures=Epyc-7713
Gres=gpu:a100:3(S:1)
NodeAddr=licca-e-047 NodeHostName=licca047 Version=25.05.0
OS=Linux 6.8.0-54-generic #56-Ubuntu SMP PREEMPT_DYNAMIC Sat Feb 8 00:37:57 UTC 2025
RealMemory=1021000 AllocMem=0 FreeMem=1020914 Sockets=2 Boards=1
State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=epyc-gpu-test
BootTime=2025-09-12T13:39:18 SlurmdStartTime=2025-09-12T13:43:30
LastBusyTime=2025-09-11T20:35:41 ResumeAfterTime=None
CfgTRES=cpu=128,mem=1021000M,billing=138,gres/gpu=3,gres/gpu:a100=3
AllocTRES=
CurrentWatts=0 AveWatts=0
Reason=Reserved for workshop [root@2025-09-12T13:04:18