1. General Description and Resources
Aim
This system is primarily oriented towards Big Data & AI communities with a focus on GPU resource needs. Support for other use cases of these GPUs is currently limited.
Compute Hardware
The following table summarizes the available compute hardware resources and the Slurm partitions to which jobs targeting these resources need to be submitted (partitions in grey are currently primarily dedicated to be used interactively via Interactive Web Servers and can typically not be targeted directly). The default time limit for individual jobs (allocations) is one hour, and the maximum is 2 days (--time=2-00:00:00
).
Slurm Partition | Number of nodes | CPUs per node | Memory per node | GPUs per node | Memory per GPU | |
---|---|---|---|---|---|---|
HGX H100 Architecture | lrz-hgx-h100-92x4 | 30 | 96 | 768 GB | 4 NVIDIA H100 | 94 GB HBM2 |
HGX A100 Architecture | lrz-hgx-a100-80x4 | 5 | 96 | 1 TB | 4 NVIDIA A100 | 80 GB HBM2 |
DGX A100 Architecture | lrz-dgx-a100-80x8 | 4 | 252 | 2 TB | 8 NVIDIA A100 | 80 GB HBM2 |
lrz-dgx-a100-40x8-mig | 1 | 252 | 1 TB | 8 NVIDIA A100 | 40 GB HBM2 | |
DGX-1 V100 Architecture | lrz-dgx-1-v100x8 | 1 | 76 | 512 GB | 8 NVIDIA Tesla V100 | 16 GB HBM2 |
DGX-1 P100 Architecture | lrz-dgx-1-p100x8 | 1 | 76 | 512 GB | 8 NVIDIA Tesla P100 | 16 GB HBM2 |
HPE Intel Skylake + NVIDIA Node | lrz-hpe-p100x4 | 1 | 28 | 256 GB | 4 NVIDIA Tesla P100 | 16 GB HBM2 |
V100 GPU Nodes | lrz-v100x2 (default) | 4 | 19 | 368 GB | 2 NVIDIA Tesla V100 | 16 GB HBM2 |
CPU Nodes | lrz-cpu | 12 | 18 / 28 / 38 / 94 | min. 360 GB | -- | -- |