2. Compute
The following table summarizes the available compute hardware and the Slurm partitions to which jobs need to be submitted. Partitions in grey are currently primarily dedicated to be used interactively via the 6. Interactive Apps and cannot be targeted directly. For individual jobs (allocations) the default time limit is 1 hour, and the maximum time limit is 2 days.
Partitions
Architecture | Slurm Partition | Number of nodes | CPUs per node | Memory per node | GPUs per node | Memory per GPU |
---|---|---|---|---|---|---|
HGX H100 (BayernKI) | lrz-hgx-h100-94x4 | 30 | 96 | 768 GB | 4 NVIDIA H100 | 94 GB HBM2 |
HGX A100 | lrz-hgx-a100-80x4 | 5 | 96 | 1 TB | 4 NVIDIA A100 | 80 GB HBM2 |
DGX A100 | lrz-dgx-a100-80x8 | 4 | 252 | 2 TB | 8 NVIDIA A100 | 80 GB HBM2 |
lrz-dgx-a100-40x8-mig | 1 | 252 | 1 TB | 8 NVIDIA A100 | 40 GB HBM2 | |
DGX-1 V100 | lrz-dgx-1-v100x8 | 1 | 76 | 512 GB | 8 NVIDIA Tesla V100 | 16 GB HBM2 |
DGX-1 P100 | lrz-dgx-1-p100x8 | 1 | 76 | 512 GB | 8 NVIDIA Tesla P100 | 16 GB HBM2 |
HPE Intel Skylake + | lrz-hpe-p100x4 | 1 | 28 | 256 GB | 4 NVIDIA Tesla P100 | 16 GB HBM2 |
V100 GPU Nodes | lrz-v100x2 (default) | 4 | 19 | 368 GB | 2 NVIDIA Tesla V100 | 16 GB HBM2 |
CPU Nodes | lrz-cpu | 12 | 18 / 28 / 38 / 94 | min. 360 GB | -- | -- |
Quick note on the partition names:
The naming convention (e.g: lrz-hgx-h100-94x4) can be roughly interpreted as:
<housing>-<platform>-<GPU model>-<VRAM per GPU>-<number of GPUs>.
MIG stands for Multi-Instance GPU.
Developed by NVIDIA to be partition a GPU into smaller GPU instances, with its own dedicated resources.
GPUs
GPU | Brand | Arch. | Year | FP32 TFLOPS | Tensor Cores FP16 TFLOPS | Memory (GB) |
---|---|---|---|---|---|---|
P100 | NVDA | Pascal | 2016 | ~9.3 | - | 16 |
V100 | NVDA | Volta | 2017 | ~15.7 | ~125 (1st Gen) | 16 |
A100 | NVDA | Ampere | 2020 | ~19.5 | ~312 (3nd Gen) | 40/80 |
H100 | NVDA | Hopper | 2022 | ~51 | ~1000 (4th Gen) | 94 |
Server
What it is | Build by | Usage | |
---|---|---|---|
HGX | GPU baseboard & platform design | OEMs + NVIDIA | Used by partners to build servers |
Complete, ready-to-use AI system | NVIDIA | Turnkey AI/HPC server from NVIDIA |