General

The NVIDIA HPC Software Development Kit (HPC-SDK) includes the proven compilers, core, math and communication libraries, profilers and debugger needed to compile a broad variety of mostly scientific software to make full, partial or hybrid use of NVIDIA GPUs in HPC environments. Apart from C, C++, Fortran and CUDA compilers, the HPC-SDK incorporates not only one or more NVIDIA CUDA Toolkits but also a full NVIDIA HPC-X stack. NVIDIA HPC-X is a comprehensive software package that includes Message Passing Interface (MPI), Symmetrical Hierarchical Memory (SHMEM) and Partitioned Global Address Space (PGAS) communications libraries, and various acceleration packages. This full-featured and tested package enables to achieve high performance, scalability, and efficiency and ensures that communication libraries are fully optimized by NVIDIA networking solutions, which are used on LiCCA.

The following component versions are included:

HPC-SDK24.123.1123.5
ComponentCUDA 11.8CUDA 12.3CUDA 11.8CUDA 12.3CUDA 11.8CUDA 12.1
nvc++24.123.1123.5
nvc24.123.1123.5
nvfortran24.123.1123.5
nvcc11.8.8912.3.10111.8.8912.3.5211.8.8912.1.105
NCCL2.18.52.18.52.18.52.18.52.18.12.18.1
NVSHMEM2.10.12.10.12.10.12.10.12.9.02.9.0
cuBLAS11.11.4.1712.3.4.111.11.4.1712.3.2.911.11.4.1712.1.3.1
cuFFT10.9.0.5811.0.12.110.9.0.5811.0.11.1910.9.0.5811.0.2.54
cuFFTMp11.0.1411.0.1411.0.1411.0.1411.0.511.0.5
cuRAND10.3.0.8610.3.4.10110.3.0.8610.3.4.5210.3.0.8610.3.2.106
cuSOLVER11.4.1.4811.5.4.10111.4.1.4811.5.3.5211.4.1.4811.4.5.107
cuSOLVERMp0.4.30.4.30.4.2N/A0.4.0N/A
cuSPARSE11.7.5.8612.2.0.10311.7.5.8612.1.3.15311.7.5.8612.1.0.106
cuTENSOR2.0.02.0.01.7.01.7.01.7.01.7.0
Nsight Compute2023.3.12023.3.02023.1.1
Nsight Systems2023.4.12023.3.12023.2.1.122
OpenMPI3.1.53.1.53.1.5
HPC-X2.142.17.12.142.162.142.15
OpenBLAS0.3.230.3.230.3.20
Scalapack2.2.02.2.02.2.02.2.02.2.02.2.0
Thrust1.15.12.2.01.15.12.2.01.15.12.0.1
CUB1.15.12.2.01.15.12.2.01.15.12.0.1
libcu++1.8.12.2.01.8.12.2.01.8.11.9.0

Compilers

nvc , nvc++ and nvfortran were formerly known as the PGI compilers before being aquired by NVIDIA.

LanguageCompilerMPI compiler
Cnvcmpicc
C++nvc++mpic++
Fortrannvfortran

mpif90

There is also the nvcc binary, which is called a CUDA compiler driver. nvcc accepts both C/C++ as well as CUDA code but needs a host compiler (typically gcc or g++ from the GCC suite) for most applications. Further details on the nvcc compilation process can be found here. If you just need nvcc , then loading the CUDA Toolkit is a more lightweight alternative should be preferred over the HPC-SDK.


Compiler bugs

Software that can be sucessfully compiled using GCC, Intel compilers or LLVM can not always be compiled using HPC-SDK compilers. Sometimes compiler flags help. It is not uncommon for compilations with HPC-SDK compilers to fail due to unresolved compiler bugs.

Lmod Modules

Main Modules

At the time of writing, the following modules are available:

Using CUDA 11.x
ml nvhpc/cu11.8/23.5
ml nvhpc/cu11.8/23.11
ml nvhpc/cu11.8/24.1
Using CUDA 12.x
ml nvhpc/cu12.1/23.5
ml nvhpc/cu12.3/23.11
ml nvhpc/cu12.3/24.1

Using CUDA 12.x (short)
ml nvhpc/23.5
ml nvhpc/23.11
ml nvhpc/24.1

HPC-X Submodules

By default the HPC-X module hpcx is loaded automatically right after loading any of the above HPC-SDK modules. While this is the desired behaviour for most usecases, there are also some other HPC-X modules available:

  • hpcx-mt This module enables multi-threading support in all of the HPC-X components. Please use this module in order to run multi-threaded applications.
  • hpcx-prof This module enables UCX compiled with profiling information.
  • hpcx-debug This module enables UCX/HCOLL/SHARP compiled in debug mode.
  • hpcx-stack This module is the same as hpcx , however all MPI functionality is excluded. This is what is also sometimes referred as nvhpc-nompi.

Only one hpcx module can be loaded at a time. When choosing another, the currently loaded one will be unloaded first.


MPI using HPC-X

HPC-X is using the OpenMPI MPI implementation under the hood. To ensure maximum compatibilty with our hardware and Slurm, all HPC-X OpenMPI versions have been manually recompiled to take all components into account. It is therefore also recommended to run applications using HPC-X MPI using Slurm's srun utility.