We announced a maintenance window for both clusters
ALCC and LiCCA to update the Slurm version to 25.11.
One of the main reasons are improvements to the
GPU allocation for Slurm jobs,
which is broken in the current version 25.05.

We might still have to adjust the Slurm configuration
for GPU job handling in the days following the update,
meaning eventually draining and restarting Slurm
daemons again.

We will at least temporarily lower the TimeLimit
in the GPU partitions from 3 to 2 days.
This might cause some inconvenience for long time active users,
but will provide a good alternative to cancelling/killing jobs
due to required restarts of the system.

  • Keine Stichwörter