Cluster is not functioning or there are some cluster related issues

In case the cluster is not functioning or there are some Cluster related issues please contact hpc-service at rz.uni-augsburg.de

First please consult 

Writing good support requests is not only good for the support team, it is also profitable for you!

The easier it is to understand your intent and the problem you are observing, the faster we will be able to provide you with a targeted response. Below, you will find a list of best practices.
Please take a minute to read this document to improve your communication with us. Thank you!

Do not send support requests to staff members directly

Always enter your HPC support requests via the Service Desk, to be assured that someone will take care of it shortly.

  • The staff will pick them up there and get back to you shortly.
  • Service requests are tracked centrally and have higher visibility for everyone.
  • Some service requests require staff with a specific specialization. Some staff members are only part-time support staff.

If you write an email to a member of the support staff directly without being told, this member might be busy with something else, might be on vacation or sick leave, etc. and your request will be delayed.

If you submit an HPC service request - please be responsive!

It is not a good idea to submit an HPC support request and then leave for a business trip or vacation for the next two weeks.

If you take into account the advice on writing a better and understandable HPC service request (as per the guidelines on this page), some questions may still remain unanswered, something left on your end to test, an additional information may need to be submitted to allow for further investigation, etc. So please be prepared to address extra questions from our staff members.

Provide a descriptive explanation of your issue

Give your request a descriptive subject

Your subject line should be descriptive and self-explaining.

Example of a not very descriptive subject (but nevertheless frequently received like this):

 It does not work!

Something like this is not a very good subject for a support request, as it could apply to virtually any support request we receive.

  • The subject is the first thing that one sees.
  • Support requests are classified according to subjects even before the support request is opened.
  • It helps to assign the support request to the most knowledgeable expert for your issue.

Therefore, it is a good practice to mention the following information already in the subject line:

  • What is the name of your application?
  • If it is your own program, at least specify the scientific domain of your application, e.g., DFT, FEM, Geo, Life Science, etc.
  • On which HPC system/Linux cluster does the issue occur?

So a better example of a descriptive subject for a support request might be something like this:

VASP DFT code: Issue on LICCA EPYC QUEUE with HDF5 library

Provide the most important information directly in your HPC Service Request

In most cases, received HPC Service Requests do not provide sufficient information to start analyzing of the problem. To substantially speed up work on your reported problem, please provide the following information already in your first submitted HPC service request message:

  • Are you trying this for the very first time <OR> did this all work perfectly before, and now it suddenly does not work for you?
  • Application name?
  • If you installed the application yourself, please provide the installation path!
  • Name of the HPC system at the University of Augsburg you are currently working on (LiCCA, GPUs involved, environment modules loaded, ...)
  • A copy of your Slurm script as text in a code block to the support request (or as a file attachment if the script block is large). Please do not attach it as images/screenshots!
  • The path of the Slurm script submission directory (where you call "sbatch ...") - and, if different, the path of your working directory with the input files of your job. If both directories are publicly accessible, the Slurm JobID can serve als an alternative.
  • The received error messages as text files, not as images/screenshots, if applicable.
  • In case of license errors in commercial software applications (e.g., Comsol, VASP, Turbomole, etc.):
    • Which license server are you accessing?
    • Which type of license do you want to receive (Research, Teaching,...)?
    • Did you have access to this license in the past, or are you requesting access to this type of license for the first time?
    • Name of your user account (UID, whoami, $USER)
    • Name of your computer (hostname, $HOSTNAME)
  • In case of SSH access issues, please provide the full output of the ssh connection attempt with specified increased verbosity:
    • ssh -vvv <your_target_host> -l $USER

Sending a screenshot of your ssh terminal (images: jpg, png, tiff...) or error messages on your monitor is less helpful! From this, one cannot simply cut and paste commands or error messages, which unnecessarily slows down the investigation of your problem. Your sample output does not have to "look good" at all - a simple text-based cut and paste directly into the ticket does the job best.

New problem – new HPC Service Request

  • Please do not use the same HPC service request to send a whole catalog of unrelated issues. And do not send support requests by responding to unrelated and older (already resolved) issues.
  • Each reported issue receives a ticket number. You will see that number in the reply, and you will receive an email notification as soon as a staff member replies.
  • Responding to unrelated issues means that your inquiry will be filed with the wrong topic and is at risk of being overlooked or ending up with the wrong HPC staff member.
  • Combining multiple unrelated issues into one HPC service request significantly slows down the analysis process and delays responses to you, as several different experts may be involved and a single thread cannot be shared with them simultaneously.

Do not treat the service team as a "human interface" to the documentation or as simple "let me Google that for you" wizards.

Have you searched the internet with the exact error message and the name of your application...? Other scientists might have had exactly the same problem and solved it successfully. By the way, that is almost always how we start investigating, too...

Reporting an issue

Specify the employed environment

Did you yourself or your colleague compile the code? Where and how?

  • Which modules were loaded before code execution?
  • Do you use an unmodified HPC user environment, or do you load "dozens" of modules already in your ~/.bashrc  or ~/.profile  ?
  • Do you use shell aliases?

If you use non-standard modules, user environments, aliases, etc. and do not provide us with information about them, we waste time debugging in a different environment. Or we may not even be able to reproduce the reported problem.

Simple cases: be specific, include commands and errors

  • Whatever you do, do not simply state that "X did not work".
  • Specify exactly the commands you ran, the environment (see above), and the output error messages.
  • The actual error messages are of great importance -  include the entire output and do not truncate it because you think parts of it might not be important for our analysis. It is easy to include them as an attached text file in your HPC service request.
The better you describe the problem, the less we have to guess and ask.

Sometimes it is enough to see the actual error message to give a useful answer. In all but the simplest cases, you need to make the problem reproducible, which you should definitely try to do. See the following points.

Complex cases: Create an example that reproduces the problem

Create an example that we can ideally just copy and run under our own accounts to reproduce the problem. It is otherwise very time consuming for the support team to write input files and run scripts based on your possibly incomplete description. See also next point. Provide us with this sample, e.g., in a separate folder in your file storage and send the path to us. We do not browse your files or interfere with your running simulation runs without your explicit permission.

Make the example as small and fast as possible

Your calculation crashes after days on hundreds of CPU cores. You might be tempted to open a support request immediately with this example, but that is not a good idea. Before you send a support request with this example, try reducing it first. You may and most likely will be able to reproduce the crash with a much smaller example (less CPU time and cores, smaller system size or grid or input data). It is much easier to schedule and debug a problem that crashes after a few seconds than a run that crashes after many hours. Of course, this requires some effort on your side, but that is what we expect you to do to create a meaningful support request. Often, when you narrow down the problem, the problem and the solution crystallize even before you write the support request.

The XY problem

This is a classic user support issue. Please read this. Often we know the solution, but sometimes we do not know the problem.

In short (quoting from the above link):

  • User wants to do X;
  • User does not know how to do X, but thinks they can fumble their way to a solution if they can just manage to do Y;
  • User does not know how to do Y either;
  • User asks for help with Y;
  • Others try to help user with Y, but are confused because Y seems like a strange problem to want to solve;
  • After much interaction and wasted time, it finally becomes clear that the user really wants help with X, and that Y was not even a suitable solution for X.

To avoid the XY problem, if you struggle with Y, but really what you are after is X, please also tell us about X. Tell us what you really want to achieve. Solving Y can take a long time. There are cases where after enormous effort on Y one realizes that the user wanted X and that Y was not the best way to achieve X on the available HPC resources, while at the same time the problem X could have been solved with a little effort and consulting by using method Z.

How to use the Cluster resources

Concerning how to use the Cluster resources please consult Knowledge Base für wissenschaftliches Rechnen (HPC) Startseite.

Available Trainings

Concerning available trainings see What trainings are available.


  • Keine Stichwörter