1. General Description and Resources

Aim

This system is primarily oriented towards Big Data & AI communities with a focus on GPU resource needs. Support for other use cases of these GPUs is currently limited.

Compute Hardware

The following table summarizes the available compute hardware resources and the Slurm partitions to which jobs targeting these resources need to be submitted (partitions in grey are currently primarily dedicated to be used interactively via Interactive Web Servers and can typically not be targeted directly). The default time limit for individual jobs (allocations) is one hour, and the maximum is 2 days (--time=2-00:00:00).


Slurm Partition

Number of nodesCPUs per nodeMemory per node

GPUs per node

Memory per GPU

HGX H100 Architecture

lrz-hgx-h100-92x4

3096

768 GB

4 NVIDIA H100

94 GB HBM2

HGX A100 Architecture

lrz-hgx-a100-80x4

596

1 TB

4 NVIDIA A100

80 GB HBM2

DGX A100 Architecture

lrz-dgx-a100-80x8

4252

2 TB

8 NVIDIA A100

80 GB HBM2

lrz-dgx-a100-40x8-mig

1252

1 TB

8 NVIDIA A100

40 GB HBM2

DGX-1 V100 Architecture

lrz-dgx-1-v100x8

176512 GB8 NVIDIA Tesla V10016 GB HBM2

DGX-1 P100 Architecture

lrz-dgx-1-p100x8

176512 GB8 NVIDIA Tesla P10016 GB HBM2

HPE Intel Skylake + NVIDIA Node

lrz-hpe-p100x4

128256 GB4 NVIDIA Tesla P10016 GB HBM2

V100 GPU Nodes

lrz-v100x2 (default)

419368 GB2 NVIDIA Tesla V10016 GB HBM2

CPU Nodes

lrz-cpu

1218 / 28 / 38 / 94min. 360 GB----