General

The NVIDIA HPC Software Development Kit (HPC-SDK) includes the proven compilers, core, math and communication libraries, profilers and debugger needed to compile a broad variety of mostly scientific software to make full, partial or hybrid use of NVIDIA GPUs in HPC environments. Apart from C, C++, Fortran and CUDA compilers, the HPC-SDK incorporates not only one or more NVIDIA CUDA Toolkits but also a full NVIDIA HPC-X stack. NVIDIA HPC-X is a comprehensive software package that includes Message Passing Interface (MPI), Symmetrical Hierarchical Memory (SHMEM) and Partitioned Global Address Space (PGAS) communications libraries, and various acceleration packages. This full-featured and tested package enables to achieve high performance, scalability, and efficiency and ensures that communication libraries are fully optimized by NVIDIA networking solutions, which are used on LiCCA.

The following component versions are included:

HPC-SDK	24.1		23.11		23.5
Component	CUDA 11.8	CUDA 12.3	CUDA 11.8	CUDA 12.3	CUDA 11.8	CUDA 12.1
nvc++	24.1		23.11		23.5
nvc	24.1		23.11		23.5
nvfortran	24.1		23.11		23.5
nvcc	11.8.89	12.3.101	11.8.89	12.3.52	11.8.89	12.1.105
NCCL	2.18.5	2.18.5	2.18.5	2.18.5	2.18.1	2.18.1
NVSHMEM	2.10.1	2.10.1	2.10.1	2.10.1	2.9.0	2.9.0
cuBLAS	11.11.4.17	12.3.4.1	11.11.4.17	12.3.2.9	11.11.4.17	12.1.3.1
cuFFT	10.9.0.58	11.0.12.1	10.9.0.58	11.0.11.19	10.9.0.58	11.0.2.54
cuFFTMp	11.0.14	11.0.14	11.0.14	11.0.14	11.0.5	11.0.5
cuRAND	10.3.0.86	10.3.4.101	10.3.0.86	10.3.4.52	10.3.0.86	10.3.2.106
cuSOLVER	11.4.1.48	11.5.4.101	11.4.1.48	11.5.3.52	11.4.1.48	11.4.5.107
cuSOLVERMp	0.4.3	0.4.3	0.4.2	N/A	0.4.0	N/A
cuSPARSE	11.7.5.86	12.2.0.103	11.7.5.86	12.1.3.153	11.7.5.86	12.1.0.106
cuTENSOR	2.0.0	2.0.0	1.7.0	1.7.0	1.7.0	1.7.0
Nsight Compute	2023.3.1		2023.3.0		2023.1.1
Nsight Systems	2023.4.1		2023.3.1		2023.2.1.122
~~OpenMPI~~	~~3.1.5~~		~~3.1.5~~		~~3.1.5~~
HPC-X	2.14	2.17.1	2.14	2.16	2.14	2.15
OpenBLAS	0.3.23		0.3.23		0.3.20
Scalapack	2.2.0	2.2.0	2.2.0	2.2.0	2.2.0	2.2.0
Thrust	1.15.1	2.2.0	1.15.1	2.2.0	1.15.1	2.0.1
CUB	1.15.1	2.2.0	1.15.1	2.2.0	1.15.1	2.0.1
libcu++	1.8.1	2.2.0	1.8.1	2.2.0	1.8.1	1.9.0

Compilers

nvc , nvc++ and nvfortran were formerly known as the PGI compilers before being aquired by NVIDIA.

Language	Compiler	MPI compiler
C	nvc	mpicc
C++	nvc++	mpic++
Fortran	nvfortran	mpif90

There is also the nvcc binary, which is called a CUDA compiler driver. nvcc accepts both C/C++ as well as CUDA code but needs a host compiler (typically gcc or g++ from the GCC suite) for most applications. Further details on the nvcc compilation process can be found here. If you just need nvcc , then loading the CUDA Toolkit is a more lightweight alternative should be preferred over the HPC-SDK.

Compiler bugs

Software that can be sucessfully compiled using GCC, Intel compilers or LLVM can not always be compiled using HPC-SDK compilers. Sometimes compiler flags help. It is not uncommon for compilations with HPC-SDK compilers to fail due to unresolved compiler bugs.

Lmod Modules

Main Modules

At the time of writing, the following modules are available:

Using CUDA 11.x

ml nvhpc/cu11.8/23.5
ml nvhpc/cu11.8/23.11
ml nvhpc/cu11.8/24.1

Using CUDA 12.x

ml nvhpc/cu12.1/23.5
ml nvhpc/cu12.3/23.11
ml nvhpc/cu12.3/24.1

Using CUDA 12.x (short)

ml nvhpc/23.5
ml nvhpc/23.11
ml nvhpc/24.1

HPC-X Submodules

By default the HPC-X module hpcx is loaded automatically right after loading any of the above HPC-SDK modules. While this is the desired behaviour for most usecases, there are also some other HPC-X modules available:

hpcx-mt This module enables multi-threading support in all of the HPC-X components. Please use this module in order to run multi-threaded applications.
hpcx-prof This module enables UCX compiled with profiling information.
hpcx-debug This module enables UCX/HCOLL/SHARP compiled in debug mode.
hpcx-stack This module is the same as hpcx , however all MPI functionality is excluded. This is what is also sometimes referred as nvhpc-nompi.

Only one hpcx module can be loaded at a time. When choosing another, the currently loaded one will be unloaded first.

MPI using HPC-X

HPC-X is using the OpenMPI MPI implementation under the hood. To ensure maximum compatibilty with our hardware and Slurm, all HPC-X OpenMPI versions have been manually recompiled to take all components into account. It is therefore also recommended to run applications using HPC-X MPI using Slurm's srun utility.

Bereichsverknüpfungen

Seitenhierarchie

General

Compilers

Lmod Modules

Main Modules

HPC-X Submodules

MPI using HPC-X

Bereichsverknüpfungen

Seitenhierarchie

Nvidia HPC-SDK

General

Compilers

Lmod Modules

Main Modules

HPC-X Submodules

MPI using HPC-X