- Angelegt von Mikheil Sekania, zuletzt geändert am 03. April 2024
In case the cluster is not functioning or there are some Cluster related issues please contact hpc-service at rz.uni-augsburg.de First please consult Always enter your HPC support requests via the Service Desk, to be assured that someone will take care of it shortly. If you write an email to a member of the support staff directly without being told, this member might be busy with something else, might be on vacation or sick leave, etc. and your request will be delayed. It is not a good idea to submit an HPC support request and then leave for a business trip or vacation for the next two weeks. If you take into account the advice on writing a better and understandable HPC service request (as per the guidelines on this page), some questions may still remain unanswered, something left on your end to test, an additional information may need to be submitted to allow for further investigation, etc. So please be prepared to address extra questions from our staff members. Your subject line should be descriptive and self-explaining. Example of a not very descriptive subject (but nevertheless frequently received like this): It does not work! Something like this is not a very good subject for a support request, as it could apply to virtually any support request we receive. Therefore, it is a good practice to mention the following information already in the subject line: So a better example of a descriptive subject for a support request might be something like this: VASP DFT code: Issue on LICCA EPYC QUEUE with HDF5 library In most cases, received HPC Service Requests do not provide sufficient information to start analyzing of the problem. To substantially speed up work on your reported problem, please provide the following information already in your first submitted HPC service request message: Sending a screenshot of your ssh terminal (images: jpg, png, tiff...) or error messages on your monitor is less helpful! From this, one cannot simply cut and paste commands or error messages, which unnecessarily slows down the investigation of your problem. Your sample output does not have to "look good" at all - a simple text-based cut and paste directly into the ticket does the job best. Have you searched the internet with the exact error message and the name of your application...? Other scientists might have had exactly the same problem and solved it successfully. By the way, that is almost always how we start investigating, too... Did you yourself or your colleague compile the code? Where and how? If you use non-standard modules, user environments, aliases, etc. and do not provide us with information about them, we waste time debugging in a different environment. Or we may not even be able to reproduce the reported problem. Sometimes it is enough to see the actual error message to give a useful answer. In all but the simplest cases, you need to make the problem reproducible, which you should definitely try to do. See the following points. Create an example that we can ideally just copy and run under our own accounts to reproduce the problem. It is otherwise very time consuming for the support team to write input files and run scripts based on your possibly incomplete description. See also next point. Provide us with this sample, e.g., in a separate folder in your file storage and send the path to us. We do not browse your files or interfere with your running simulation runs without your explicit permission. Your calculation crashes after days on hundreds of CPU cores. You might be tempted to open a support request immediately with this example, but that is not a good idea. Before you send a support request with this example, try reducing it first. You may and most likely will be able to reproduce the crash with a much smaller example (less CPU time and cores, smaller system size or grid or input data). It is much easier to schedule and debug a problem that crashes after a few seconds than a run that crashes after many hours. Of course, this requires some effort on your side, but that is what we expect you to do to create a meaningful support request. Often, when you narrow down the problem, the problem and the solution crystallize even before you write the support request. This is a classic user support issue. Please read this. Often we know the solution, but sometimes we do not know the problem. In short (quoting from the above link): To avoid the XY problem, if you struggle with Y, but really what you are after is X, please also tell us about X. Tell us what you really want to achieve. Solving Y can take a long time. There are cases where after enormous effort on Y one realizes that the user wanted X and that Y was not the best way to achieve X on the available HPC resources, while at the same time the problem X could have been solved with a little effort and consulting by using method Z. Concerning how to use the Cluster resources please consult Knowledge Base für wissenschaftliches Rechnen (HPC) Startseite. Concerning available trainings see What trainings are available.Cluster is not functioning or there are some cluster related issues
Writing good support requests is not only good for the support team, it is also profitable for you!
Do not send support requests to staff members directly
If you submit an HPC service request - please be responsive!
Provide a descriptive explanation of your issue
Give your request a descriptive subject
Provide the most important information directly in your HPC Service Request
files, not as images/screenshots, if applicable.ssh -vvv <your_target_host> -l $USER
New problem – new HPC Service Request
Do not treat the service team as a "human interface" to the documentation or as simple "let me Google that for you" wizards.
Reporting an issue
Specify the employed environment
~/.bashrc
or ~/.profile
?Simple cases: be specific, include commands and errors
Complex cases: Create an example that reproduces the problem
Make the example as small and fast as possible
The XY problem
How to use the Cluster resources
Available Trainings
- Keine Stichwörter