Monday, 18.5.: final synchronization done, clusters ALCC and LiCCA are back online. In case of problems, please open a ticket with our Service-Desk.
15.5.: migration still running, estimate for finishing: Monday, 18.5.
13.5., 9:00: final migration steps with data synchronization have been started,
no login possible until migration complete
The December module updates and deprecations have been rolled out today. Please look out for deprecation warnings in your Slurm output.
Please be aware of the following module changes (if default appears at the end, this is the new default!):
New/updated scientific Modules:
====================
elk/10.7.8-impi2021.10-intel2023.2 (default)
gromacs/2025.4-ompi5.0-gcc13.2-mkl2023.2-cuda12.9
nwchem/7.3.1-ompi5.0-cf (default)
octave/10.3.0-cf (default)
orca/6.1.1 (default)
qchem/6.3.1
qchem/6.4.0 (default)
siesta/5.4.1-ompi5.0-cf (default)
New/updated common Modules:
====================
cmake/3.31.10
cmake/4.2.1 (default)
ffmpeg/8.0.1 (default)
meson/1.10.0 (default)
meson/1.9.2
ninja/1.13.2 (default)
anaconda/2025.12 (default)
apptainer/1.4.5 (default)
cudnn/cu11x/9.10.2.21 (default)
cudnn/cu12x/9.17.0.29 (default)
cuquantum/cu11x/25.06.0.10 (default)
cuquantum/cu12x/25.11.1.11 (default)
cutensor/cu11x/2.2.0.0 (default)
cutensor/cu12x/2.4.1.4 (default)
gdrcopy/2.5.1
go/1.24.11
go/1.25.5 (default)
julia/1.10.10
julia/1.12.2 (default)
micromamba/2.4.0 (default)
miniforge/25.11.0 (default)
nccl/cu12.9/2.27.7
nccl/cu12.9/2.28.9 (default)
openjdk/11.0.29+7
openjdk/17.0.17+10
openjdk/21.0.9+10
openjdk/25.0.1+8 (default)
openjdk/8.u472-b08
Deprecated Modules (to be hidden on 15th of January 2026 and removed on 30th of January 2026):
=================================================
cmake/4.0.3: Please use cmake/4.2.1 or higher!
anaconda/2024.06: Please use anaconda/2024.10 or higher!
apptainer/1.3.5: Please use apptainer/1.3.6 or higher!
julia/1.10.8: Please use julia/1.10.10 or higher!
julia/1.11.3: Please use julia/1.12.2 or higher!
elk/10.5.16-impi2021.10-intel2023.2: Please use elk/10.7.8-impi2021.10-intel2023.2 or higher!
elk/10.6.2-impi2021.10-intel2023.2: Please use elk/10.7.8-impi2021.10-intel2023.2 or higher!
ffmpeg/6.1: Please use ffmpeg/7.0.1 or higher!
meson/1.4.2: Please use meson/1.8.4 or higher!
meson/1.5.2: Please use meson/1.8.4 or higher!
meson/1.7.2: Please use meson/1.8.4 or higher!
micromamba/2.0.5: Please use micromamba/2.4.0 or higher!
micromamba/2.2.0: Please use micromamba/2.4.0 or higher!
micromamba/2.3.0: Please use micromamba/2.4.0 or higher!
qchem/6.3.0: Please use qchem/6.3.1 or higher!
octave/8.4.0-cf: Please use octave/9.1.0-cf or higher!
siesta/5.2.0-ompi4.1-cf: Please use siesta/5.4.1-ompi5.0-cf or higher!
siesta/5.4.0-ompi5.0-cf: Please use siesta/5.4.1-ompi5.0-cf or higher!
comsol/6.1: Please use comsol/6.2 or higher!
cuda/11.6.2: Please use cuda/11.8.0 or higher!
cuda/12.1.1: Please use cuda/12.5.1 or higher!
cuda/12.2.2: Please use cuda/12.5.1 or higher!
cuda/12.3.2: Please use cuda/12.5.1 or higher!
cuda/12.4.1: Please use cuda/12.5.1 or higher!
cuda/12.6.2: Please use cuda/12.6.3 or higher!
cuda/12.8.0: Please use cuda/12.8.1 or higher!
cuda-compat/12.9.1: The current cuda driver is already newer!
nccl/cu12.2/2.21.5: Please use nccl/cu12.5/2.21.5 or higher!
nccl/cu12.4/2.21.5: Please use nccl/cu12.5/2.21.5 or higher!
cudnn/cu11x/8.9.7.29: Please use cudnn/cu11x/9.10.2.21 or higher!
cudnn/cu11x/9.0.0.312: Please use cudnn/cu11x/9.10.2.21 or higher!
cudnn/cu11x/9.1.1.17: Please use cudnn/cu11x/9.10.2.21 or higher!
cudnn/cu11x/9.2.1.18: Please use cudnn/cu11x/9.10.2.21 or higher!
cudnn/cu11x/9.3.0.75: Please use cudnn/cu11x/9.10.2.21 or higher!
cudnn/cu11x/9.4.0.58: Please use cudnn/cu11x/9.10.2.21 or higher!
cudnn/cu11x/9.5.1.17: Please use cudnn/cu11x/9.10.2.21 or higher!
cudnn/cu12x/8.9.7.29: Please use cudnn/cu12x/9.17.0.29 or higher!
cudnn/cu12x/9.0.0.312: Please use cudnn/cu12x/9.17.0.29 or higher!
cudnn/cu12x/9.1.1.17: Please use cudnn/cu12x/9.17.0.29 or higher!
cudnn/cu12x/9.2.1.18: Please use cudnn/cu12x/9.17.0.29 or higher!
cudnn/cu12x/9.3.0.75: Please use cudnn/cu12x/9.17.0.29 or higher!
cudnn/cu12x/9.4.0.58: Please use cudnn/cu12x/9.17.0.29 or higher!
cudnn/cu12x/9.5.1.17: Please use cudnn/cu12x/9.17.0.29 or higher!
cutensor/cu11x/2.0.1.2: Please use cutensor/cu11x/2.2.0.0 or higher!
cutensor/cu11x/2.0.2.5: Please use cutensor/cu11x/2.2.0.0 or higher!
cutensor/cu12x/2.0.1.2: Please use cutensor/cu12x/2.4.1.4 or higher!
cutensor/cu12x/2.0.2.5: Please use cutensor/cu12x/2.4.1.4 or higher!
cuquantum/cu11x/24.03.0.4: Please use cuquantum/cu11x/25.06.0.10 or higher!
cuquantum/cu11x/24.08.0.5: Please use cuquantum/cu11x/25.06.0.10 or higher!
cuquantum/cu11x/24.11.0.21: Please use cuquantum/cu11x/25.06.0.10 or higher!
cuquantum/cu12x/24.03.0.4: Please use cuquantum/cu12x/25.11.1.11 or higher!
cuquantum/cu12x/24.08.0.5: Please use cuquantum/cu12x/25.11.1.11 or higher!
cuquantum/cu12x/24.11.0.21: Please use cuquantum/cu12x/25.11.1.11 or higher!
If you experience problems with any module, please let us know!
Both clusters ALCC and LiCCA are back online.
We announced a maintenance window for both clusters
ALCC and LiCCA to update the Slurm version to 25.11.
One of the main reasons are improvements to the
GPU allocation for Slurm jobs,
which is broken in the current version 25.05.
We might still have to adjust the Slurm configuration
for GPU job handling in the days following the update,
meaning eventually draining and restarting Slurm
daemons again.
We will at least temporarily lower the TimeLimit
in the GPU partitions from 3 to 2 days.
This might cause some inconvenience for long time active users,
but will provide a good alternative to cancelling/killing jobs
due to required restarts of the system.
Since the last major upgrade of both clusters ALCC
and LiCCA in July, we observe some problems with
Slurm jobs allocating GPUs, and with our Slurm accounting
database. Recent Slurm updates (Slurm version 25.11) should
fix these problems.
Maintenance schedule:
- Friday, 28.November, 9:00, set all partitions to drain
- Monday, 1.December, 9:00, start of Slurm update
-- GPU partitions drained
-- CPU partitions draining, runnning jobs continue,
job survival not guaranteed
- Monday, 1.December: we plan to resume all partitions till 18:00
- login nodes will not be available for users until
the maintenance is finished.
The July module updates and deprecations have been rolled out today. Please look out for deprecation warnings in your Slurm output.
After the Maintenance and Upgrade to Ubuntu 24.04 there are two major changes:
- The default cuda version is now v12.8, since this is what the nvidia driver natively supports.
- The intel/2023 compilers need a compatible GNU compiler. Unfortunately intel/2023 compilers are not compatible with gcc v13 which is the new Ubuntu default.
=> When loading intel/2023, gcc v11.5 compilers will now be loaded as well (but without overriding CC, CXX or FC environment variables)
Also please be aware of the following module changes, which just have been deployed (if default appears at the end, this is the new default!):
New/updated scientific Modules:
cp2k/2025.2-ompi5.0-cuda12.8-gcc13.2
cp2k/2025.2-ompi5.0-gcc13.2 (default)
comsol/6.3.0.335 (default)
elk/10.5.16-impi2021.10-intel2023.2 (default)
gromacs/2025.2-ompi5.0-gcc13.2-mkl2023.2-cuda12.9
lammps/20240829.4-ompi5.0-cuda12.9-gcc13.2
lammps/20240829.4-ompi5.0-gcc13.2 (default)
lammps/20250722.0-ompi5.0-cuda12.9-gcc13.2
lammps/20250722.0-ompi5.0-gcc13.2
mathematica/14.2.1 (default)
orca/6.1.0 (default)
qchem/6.3.0 (default)
qe/7.4.1-impi2021.10-intel2023.2 (default)
qe/7.4.1-ompi4.1-nvhpc24.1
siesta/5.4.0-ompi5.0-cf (default)
vasp6/6.5.1-impi2021.10-intel2023.2 (default)
vasp6/6.5.1-cuda12.3-ompi4.1-nvhpc24.1
vasp6/python3.12/6.5.1-impi2021.10-intel2023.2
New/updated common Modules:
cmake/3.31.8 (default)
cmake/4.0.3
cuda-compat/12.9.1
cuda/12.8.1 (default, in line with the CUDA level of the Nvidia driver)
cuda/12.9.1
emacs/30.1 (default)
gdrcopy/2.5
meson/1.8.3 (default)
micromamba/2.3.0 (default)
nccl/cu12.8/2.26.2
ninja/1.13.2 (default)
parallel/20250622 (default)
pmix/5.0.7 (default)
R/4.4.3-cf (default)
ucc/cu11x/1.4.4 (default)
ucc/cu12x/1.4.4 (default)
ucx/cu11x/1.19.0 (default)
ucx/cu12x/1.19.0 (default)
New/updated library Modules:
hdf5/1.14.6 (for compilers gcc/9.5, gcc/11.5, gcc/13.2, intel/2021.4, intel/2023.2, intel/2024.2, nvhpc/24.1) (default)
libxc/7.0.0 (for compilers gcc/13.2, intel/2023.2, intel/2024.2) (default)
openblas/lp64/0.3.30 (for compilers gcc/9.5, gcc/11.5, gcc/13.2) (default)
openblas/ilp64/0.3.30 (for compilers gcc/9.5, gcc/11.5, gcc/13.2)
gmp/6.3.0 (for compilers gcc/13.2) (default)
sqlite3/3.50.4 (for compilers gcc/9.5, gcc/11.5, gcc/13.2) (default)
tblite/0.4.0 (for compilers gcc/13.2) (default)
New/updated MPI Modules:
openmpi/4.1.8 (for compilers gcc/9.5, gcc/11.5, gcc/13.2, intel/2021.4, intel/2023.2, intel/2024.2) (default)
openmpi/5.0.8 (for compilers gcc/9.5, gcc/11.5, gcc/13.2, intel/2021.4, intel/2023.2, intel/2024.2) (default)
hdf5/1.14.6 (for compilers gcc/9.5, gcc/11.5, gcc/13.2, intel/2021.4, intel/2023.2, intel/2024.2, nvhpc/24.1) (default)
netcdf/c/4.9.3 (for compilers gcc/13, intel/2023.2, intel/2024.2) (default)
netcdf/fortran/4.6.2 (for compilers gcc/13, intel/2023.2, intel/2024.2) (default)
pnetcdf/1.14.0 (for compilers gcc/13, intel/2023.2, intel/2024.2) (default)
If you experience problems with any module, please let us know!
The clusters will be undergoing a one-week maintenance shutdown from July, 7th to July, 11. During this time, the system will be completely unavailable. We kindly ask that you take this into consideration while planning your computational tasks, and we appreciate your understanding and cooperation.
Reason for the Maintenance:
The reason for this planned maintenance is to implement several critical upgrades and improvements to the clusters' infrastructure. These updates are designed to enhance both the stability and security of the system. The maintenance tasks are outlined below:
1. Update to Ubuntu 24.04 on Compute and Management Nodes
To maintain compatibility with the latest software and to benefit from long-term support, all the cluster’s compute and management nodes will be upgraded to Ubuntu 24.04 LTS. This update will include new features, security patches, and enhancements that improve the stability and performance of the cluster. The upgrade will also ensure the system remains in a supported state, with access to the latest bug fixes and software optimizations.
2. HPC Data Cluster Update
The HPC data cluster file system will be upgraded to the latest supported version. This upgrade includes several improvements in performance, security, and overall system efficiency. By updating the cluster, we ensure that it remains in line with current best practices for HPC environments, enhancing both reliability and compatibility with the latest applications and workloads.
3. Upgrade to Slurm 25.05
We will also be upgrading the Slurm workload manager to version 25.05. This new version includes several significant improvements, including performance enhancements, new features for job scheduling, and bug fixes that address known issues. The upgrade will help improve the overall efficiency of job queuing and resource management within the cluster, enabling you to achieve better performance and usability.
4. Re-cabling and Adjusting of the Power Supply
Since the addition of new high-power consumption nodes to the cluster, it has become necessary to balance the overall power usage more efficiently. This requires a complete re-cabling and re-plugging of the power supply lines to ensure a more stable distribution of power across the system. As a result, the cluster will need to be shut down to perform this task safely. This operation is essential for optimizing the system’s power management and preventing potential overloads.
5. Network Isolation for Enhanced Security
As part of our ongoing efforts to enhance the security of the HPC cluster, we will be implementing additional isolation within the cluster's data network. This step is critical to protect sensitive data and improve the overall integrity of the network infrastructure. To achieve this, we will be performing re-cabling and reconfiguration of the network setup. This new network architecture will provide better isolation between internal and external traffic, mitigating any potential security risks.
Expected Downtime and Impact:
Please be aware that the maintenance period will result in complete downtime for the entire cluster. No jobs or tasks will be able to run during this time, and access to the system will be temporarily disabled. We recommend that you complete any critical tasks or jobs before the maintenance window begins.
Planned schedule:
Sunday, July, 6th, 12:00 Slurm stops accepting new jobs
Monday, July, 7th, 10:00 Both clusters will be shutdown,
running jobs will be terminated, pending jobs will be removed
Monday, July, 7th, 10:30 Start of Maintenance
Friday, July 11th, 17:00 End of Maintenance, clusters are back in working mode
As soon as maintenance is finished successfully, you will be informed directly via email over this list.
What you need to do:
- If you have any data or active jobs on the cluster, please ensure that they are saved or completed before Monday, July, 7th, 10:00.
- Please refrain from submitting new jobs or launching tasks Sunday, July, 6th, 12:00.
- If you need additional support or have specific questions about the maintenance process, please reach out to us at your earliest convenience.
We understand that planned downtime can cause some disruption, and we apologize for any inconvenience this may cause. Our team is committed to completing this work as efficiently as possible to minimize the impact on your research and work.
As soon as maintenance is finished successfully, you will be informed directly via email over this list.
The January module updates and deprecations have been rolled out today. Please look out for deprecation warnings in your Slurm output.
New scientific Modules:
====================
plumed/2.9.3-impi2021.10-intel2023.2
cp2k/2025.1-ompi5.0-cuda12.6-gcc13.2.lua
cp2k/2025.1-ompi5.0-gcc13.2.lua
elk/10.2.4-impi2021.10-intel2023.2.lua
elk/10.3.12-impi2021.10-intel2023.2.lua
gromacs/2024.5-ompi5.0-gcc13.2-mkl2023.2-cuda12.6
turbomole/7.9.0
vasp6/6.5.0-impi2021.10-intel2023.2
vasp6/python3.12/6.5.0-impi2021.10-intel2023.2
New/updated common Modules:
====================
apptainer/1.3.6
cmake/3.31.4
cuda-compat/12.8.0
cuda/12.8.0
cuquantum/cu11x/24.11.0.21
cuquantum/cu12x/24.11.0.21
julia/1.10.8
julia/1.11.3
meson/1.7.0
micromamba/2.0.5
miniforge/24.11.3
parallel/20250122
pmix/5.0.6
ucx/cu11x/1.18.0
ucx/cu12x/1.18.0
New/updated library Modules:
====================
hdf5/1.14.5 (for compilers gcc/13.2, intel/2023.2, nvhpc/24.1)
libxc/7.0.0 (for compilers gcc/13.2, intel/2023.2, intel/2024.2)
As you know, Docker containers cannot be run natively in HPC environments, but they can be easily converted and executed using Apptainer (Documentation).
Currently an outdated apptainer installation is available without loading any module.
This installation of apptainer will be removed on 30th of November!
Please change your workflows to make use of more recent apptainer modules!
The November module updates and deprecations have been rolled out today. Please look out for deprecation warnings in your Slurm output.
Most notable new modules:
Common
aocc/5.0.0
aocl/ilp64/5.0.0
aocl/lp64/5.0.0
anaconda/2024.10
apptainer/1.3.5
cudnn/cu11x/9.5.1.17
cudnn/cu12x/9.5.1.17
micromamba/2.0.3
openjdk/8.u432-b06
openjdk/11.0.25+9
openjdk/17.0.13+11
openjdk/21.0.5+11
Scientific
gromacs/2024.4-ompi5.0-gcc13.2-mkl2023.2-cuda12.6
orca/6.0.1
siesta/5.2.0-ompi4.1-cf
The September module updates and deprecations have been rolled out yesterday. Please look out for deprecation warnings in your Slurm output.
Most notable new modules:
cuda-compatmodules for better compatibility using cuda toolskits > 12.2, see also https://collab.dvb.bayern/x/3PxdFw#NvidiaCUDAToolkit-CUDAToolkitinteroperability (cuda module needs to be loaded first)cuda-mpsmodule for automatically starting and stopping CUDA Multi-Process-Service (highly recommended for code that cannot saturate A100 GPUs!), see also https://collab.dvb.bayern/x/m-xdFw#SubmittingGPUJobs-CUDAMulti-Process-Serverformaximumefficiencyintel/2024.2.1modules have been installed,intel/2024.1.0is now deprecated, intel/2023.2.1 is still the default, see also heregcc/11.5.0modules has been installed,gcc/11.4.0is now deprecatedorca/6.0.0is available now, see also https://collab.dvb.bayern/x/yfxdFw
Since Friday, 30.08., a quota notification system is active. Users will get an E-Mail message when their quota are exceeded. More information in our Knowledege Base: Quota regulations
On LiCCA a separate partition epyc-gpu-test has been created, and node licca047 with its 3 A100 GPUs moved to this partition. The TimeLimit in this partition is 6 hours, to give users the possibility to test with short job runs, while the bigger epyc-gpu is loaded with longer running jobs. If this is not really used, we will move GPUs (partially) back. All projects and users with GPU resources are automatically granted access to this partition.