General
The NVIDIA HPC Software Development Kit (HPC-SDK) includes the proven compilers, core, math and communication libraries, profilers and debugger needed to compile a broad variety of mostly scientific software to make full, partial or hybrid use of NVIDIA GPUs in HPC environments. Apart from C, C++, Fortran and CUDA compilers, the HPC-SDK incorporates not only one or more NVIDIA CUDA Toolkits but also a full NVIDIA HPC-X stack. NVIDIA HPC-X is a comprehensive software package that includes Message Passing Interface (MPI), Symmetrical Hierarchical Memory (SHMEM) and Partitioned Global Address Space (PGAS) communications libraries, and various acceleration packages. This full-featured and tested package enables to achieve high performance, scalability, and efficiency and ensures that communication libraries are fully optimized by NVIDIA networking solutions, which are used on LiCCA.
The following component versions are included:
HPC-SDK | 24.1 | 23.11 | 23.5 | |||
---|---|---|---|---|---|---|
Component | CUDA 11.8 | CUDA 12.3 | CUDA 11.8 | CUDA 12.3 | CUDA 11.8 | CUDA 12.1 |
nvc++ | 24.1 | 23.11 | 23.5 | |||
nvc | 24.1 | 23.11 | 23.5 | |||
nvfortran | 24.1 | 23.11 | 23.5 | |||
nvcc | 11.8.89 | 12.3.101 | 11.8.89 | 12.3.52 | 11.8.89 | 12.1.105 |
NCCL | 2.18.5 | 2.18.5 | 2.18.5 | 2.18.5 | 2.18.1 | 2.18.1 |
NVSHMEM | 2.10.1 | 2.10.1 | 2.10.1 | 2.10.1 | 2.9.0 | 2.9.0 |
cuBLAS | 11.11.4.17 | 12.3.4.1 | 11.11.4.17 | 12.3.2.9 | 11.11.4.17 | 12.1.3.1 |
cuFFT | 10.9.0.58 | 11.0.12.1 | 10.9.0.58 | 11.0.11.19 | 10.9.0.58 | 11.0.2.54 |
cuFFTMp | 11.0.14 | 11.0.14 | 11.0.14 | 11.0.14 | 11.0.5 | 11.0.5 |
cuRAND | 10.3.0.86 | 10.3.4.101 | 10.3.0.86 | 10.3.4.52 | 10.3.0.86 | 10.3.2.106 |
cuSOLVER | 11.4.1.48 | 11.5.4.101 | 11.4.1.48 | 11.5.3.52 | 11.4.1.48 | 11.4.5.107 |
cuSOLVERMp | 0.4.3 | 0.4.3 | 0.4.2 | N/A | 0.4.0 | N/A |
cuSPARSE | 11.7.5.86 | 12.2.0.103 | 11.7.5.86 | 12.1.3.153 | 11.7.5.86 | 12.1.0.106 |
cuTENSOR | 2.0.0 | 2.0.0 | 1.7.0 | 1.7.0 | 1.7.0 | 1.7.0 |
Nsight Compute | 2023.3.1 | 2023.3.0 | 2023.1.1 | |||
Nsight Systems | 2023.4.1 | 2023.3.1 | 2023.2.1.122 | |||
HPC-X | 2.14 | 2.17.1 | 2.14 | 2.16 | 2.14 | 2.15 |
OpenBLAS | 0.3.23 | 0.3.23 | 0.3.20 | |||
Scalapack | 2.2.0 | 2.2.0 | 2.2.0 | 2.2.0 | 2.2.0 | 2.2.0 |
Thrust | 1.15.1 | 2.2.0 | 1.15.1 | 2.2.0 | 1.15.1 | 2.0.1 |
CUB | 1.15.1 | 2.2.0 | 1.15.1 | 2.2.0 | 1.15.1 | 2.0.1 |
libcu++ | 1.8.1 | 2.2.0 | 1.8.1 | 2.2.0 | 1.8.1 | 1.9.0 |
Compilers
nvc
, nvc++
and nvfortran
were formerly known as the PGI compilers before being aquired by NVIDIA.
Language | Compiler | MPI compiler |
---|---|---|
C | nvc | mpicc |
C++ | nvc++ | mpic++ |
Fortran | nvfortran | mpif90 |
There is also the nvcc
binary, which is called a CUDA compiler driver. nvcc
accepts both C/C++ as well as CUDA code but needs a host compiler (typically gcc
or g++
from the GCC suite) for most applications. Further details on the nvcc
compilation process can be found here. If you just need nvcc
, then loading the CUDA Toolkit is a more lightweight alternative should be preferred over the HPC-SDK.
Compiler bugs
Software that can be sucessfully compiled using GCC, Intel compilers or LLVM can not always be compiled using HPC-SDK compilers. Sometimes compiler flags help. It is not uncommon for compilations with HPC-SDK compilers to fail due to unresolved compiler bugs.
Lmod Modules
Main Modules
At the time of writing, the following modules are available:
ml nvhpc/cu11.8/23.5 ml nvhpc/cu11.8/23.11 ml nvhpc/cu11.8/24.1
ml nvhpc/cu12.1/23.5 ml nvhpc/cu12.3/23.11 ml nvhpc/cu12.3/24.1
ml nvhpc/23.5 ml nvhpc/23.11 ml nvhpc/24.1
HPC-X Submodules
By default the HPC-X module hpcx
is loaded automatically right after loading any of the above HPC-SDK modules. While this is the desired behaviour for most usecases, there are also some other HPC-X modules available:
hpcx-mt
This module enables multi-threading support in all of the HPC-X components. Please use this module in order to run multi-threaded applications.hpcx-prof
This module enables UCX compiled with profiling information.hpcx-debug
This module enables UCX/HCOLL/SHARP compiled in debug mode.hpcx-stack
This module is the same ashpcx
, however all MPI functionality is excluded. This is what is also sometimes referred asnvhpc-nompi
.
Only one hpcx
module can be loaded at a time. When choosing another, the currently loaded one will be unloaded first.
MPI using HPC-X
HPC-X is using the OpenMPI MPI implementation under the hood. To ensure maximum compatibilty with our hardware and Slurm, all HPC-X OpenMPI versions have been manually recompiled to take all components into account. It is therefore also recommended to run applications using HPC-X MPI using Slurm's srun
utility.