2. Compute

The following table summarizes the available compute hardware and the Slurm partitions to which jobs need to be submitted. Partitions in grey are currently primarily dedicated to be used interactively via the 6. Interactive Apps and cannot be targeted directly. For individual jobs (allocations) the default time limit is 1 hour, and the maximum time limit is 2 days.

Partitions

Architecture

Slurm Partition

Number of nodesCPUs per nodeMemory per node

GPUs per node

Memory per GPU

HGX H100 (BayernKI)

lrz-hgx-h100-94x4

3096

768 GB

4 NVIDIA H100

94 GB HBM2

HGX A100

lrz-hgx-a100-80x4

596

1 TB

4 NVIDIA A100

80 GB HBM2

DGX A100

lrz-dgx-a100-80x8

4252

2 TB

8 NVIDIA A100

80 GB HBM2

lrz-dgx-a100-40x8-mig

1252

1 TB

8 NVIDIA A100

40 GB HBM2

DGX-1 V100

lrz-dgx-1-v100x8

176512 GB8 NVIDIA Tesla V10016 GB HBM2

DGX-1 P100

lrz-dgx-1-p100x8

176512 GB8 NVIDIA Tesla P10016 GB HBM2

HPE Intel Skylake +
P100

lrz-hpe-p100x4

128256 GB4 NVIDIA Tesla P10016 GB HBM2

V100 GPU Nodes

lrz-v100x2 (default)

419368 GB2 NVIDIA Tesla V10016 GB HBM2

CPU Nodes

lrz-cpu

1218 / 28 / 38 / 94min. 360 GB----

Quick note on the partition names:
The naming convention (e.g: lrz-hgx-h100-94x4) can be roughly interpreted as:
<housing>-<platform>-<GPU model>-<VRAM per GPU>-<number of GPUs>.

MIG stands for Multi-Instance GPU.
Developed by NVIDIA to be partition a GPU into smaller GPU instances, with its own dedicated resources.

GPUs

GPUBrandArch.YearFP32 TFLOPSTensor Cores FP16 TFLOPSMemory (GB)
P100NVDAPascal2016~9.3-16
V100NVDAVolta2017~15.7~125 (1st Gen)16
A100NVDAAmpere2020~19.5~312 (3nd Gen)40/80
H100NVDAHopper2022~51~1000 (4th Gen)94

Server


What it isBuild byUsage
HGX

GPU baseboard & platform design

OEMs + NVIDIA

Used by partners to build servers

DGX

Complete, ready-to-use AI system

NVIDIA

Turnkey AI/HPC server from NVIDIA