11. Public Datasets and Containers on the LRZ AI Systems
When developing new AI methods or evaluating existing ones, ML/AI researchers and scientists routinely use public datasets. Often the very same datasets are used by different research groups, which end up downloading these to their own storage. For example, more than one research group might download the Alphafold
database needed for predicting 3D protein structures (see https://alphafold.ebi.ac.uk/, >2TB). This situation has previously lead to data replication and storage capacity wasting for both, users and LRZ.
To avoid the situation described above, the LRZ AI Systems offer a dedicated Data Science Storage (DSS) container aimed at storing public datasets as well as, potentially, Enroot
container images of interest to more than one researcher.
Available datasets and Enroot images
11.0 Available Public Datasets
11.1 Available Enroot Container Images (currently none provided, see below for requests)
How to request the addition of public datasets
Users interested in a particular dataset need to:
- make sure the dataset is licensed for public usage and requires no individual license nor registration
- open a ticket with the LRZ Servicedesk, providing the location of the dataset and a justification for public interest (including the expected target audience)
- provide clear instructions for downloading it (ideally in the form of a shell script)
An example of request is as follows:
Acceptance & implementation is subject to feasibility and available resources.
How to request Enroot images on the AI systems
Users interested in a particular image need to:
- make sure the image is licensed for public usage and requires no individual license or registration
- make sure the image is not provided by the Nvidia NGC, Dockerhub or another public repository directly
- write a ticket with the location of the Dockerfile for building the image and a justification for public interest (including the expected target audience)
- provide clear instructions for building the image (in case it deviates from the standard procedure)
Acceptance & implementation is subject to feasibility and available resources.