9. Creating and Reusing a Custom Enroot Container Image
NVIDIA NGC offers a catalogue of containers covering a broad spectrum of software packages (see 5. Using NVIDIA NGC Containers on the LRZ AI Systems.) These containers supply the CUDA Toolkit
, cuDNN libraries
, and NVIDIA dependencies. It is also possible to use containers from a different container registry or catalogue (for which the latter might not hold true.)
No matter where your container image comes from, your workload might depend on a package not provided by that image. This guide describes how to create a new Enroot container image by extending an existing container image. The required steps depend on whether your image comes from the NGC catalogue or not.
Choose a base image and a target system/partition
We refer in this guide to an image of the NVIDIA NGC or other catalogues (e.g., Docker) as base image. For example, the docker image docker:
//nvcr
.io
#nvidia/tensorflow
:20.12-tf1-py3
(from NVIDIA NGC catalogue) is used in Section 3 of this guide as base image. Let us assume then we have the variable where the label of that image is stored. (e.g., BASE_IMAGE=docker://nvcr.io#nvidia/tensorflow:20.12-tf1-py3).
Choose the target system where our final custom image will be used (see 1. General Description and Resources for available target systems). For example, the partition dgx-1-p100
is used in this guide.
Create an interactive allocation of resources on the target system. A single GPU suffices for this task.
$ salloc -p lrz-v100x2 --gres=gpu:1
Execute a terminal within the allocated machine.
$ srun --pty bash
If you choose an image from NVIDIA NGC catalog skip the next section and go directly to Section 3.
Dealing with base images from other catalogues
The presence of some Nvidia libraries within a container image might produce some crashes. If you are not using a container from NGC, be sure your image does not include:
- The CUDA toolkit library
- The Nvidia libcontainer toolkit library (libnvidia-container)
Let the Enroot runtime add these required libraries into your containers. For that some extra configurations steps are required.
First, create an Enroot container out of the chosen base image.
$ enroot import -o image-no-cuda.sqsh $BASE_IMAGE # creates an Enroot container image out of that docker container $ enroot create --name my_container_first_step image-no-cuda.sqsh # creates an Enroot container named "my_container_firs_step"
Start bash within the created container.
$ enroot start my_container_first_step bash
You must add a couple of environment variables within this container. These variables will let Enroot know that you want to use CUDA
and the runtime will copy within the container the needed libraries (see https://github.com/NVIDIA/NVIDIA-container-runtime#environment-variables-oci-spec). Examples of the variables described on the NVIDIA documentation are the following:
NVIDIA_DRIVER_CAPABILITIES # what do you need from the driver computing, utilities, rendering? NVIDIA_REQUIRE_CUDA # what version of Cuda do you need for your application NVIDIA_VISIBLE_DEVICES # which devices should be visible for this container
Once you have figure out the needed variables and their values (this depends on what you need, check the NVIDIA documentation or get in touch with us) add these variables into the file /etc/environment of your container. The following code block shows an example.
echo "NVIDIA_DRIVER_CAPABILITIES=compute,utility,video" >> /etc/environment echo "NVIDIA_REQUIRE_CUDA=cuda>=9.0" >> /etc/environment echo "NVIDIA_VISIBLE_DEVICES=all" >> /etc/environment
Exit the container and export it as an Enroot image.
$ exit $ enroot export --output my_temporal_container.sqsh my_container_first_step # creates an Enroot image called my_temporal_container.sqsh in the current path
Now you have a container prepared for cuda
and the Enroot runtime will rely on the added variables to add what is needed automatically within it. Go to Section 3 with BASE_IMAGE="$PWD/my_temporal_container.sqsh"
Creating an extended Enroot image
Create an Enroot container out of the base image:
$ enroot import -o base_image.sqsh $BASE_IMAGE # creates an Enroot container image out of that docker container $ enroot create --name my_container base_image.sqsh # creates an Enroot container named "my_container"
Start the created Enroot container and install any needed package (this example assumes the matplotlib
python package needs to be added.) Exit the container once the packages have been added.
$ enroot start my_container $ pip3 install matplotlib $ exit
Export the modified Enroot container as an Enroot container image.
$ enroot export --output my_container.sqsh my_container # creates an Enroot image called my_container.sqsh in the current path # (assuming PWD=/my-path the complete path to the created image is /my-path/my-container.sqsh)
Release the allocated resources.
$ exit
*Note: Sometimes, for installing some applications you need to be root within the container (e.g., installing software using the apt
package manager in Debian and Ubuntu-based containers.) In this case, start as root in the container with the --root
option as described in 4. Introduction to Enroot: The Software Stack Provider for the LRZ AI Systems. A complete example is shown next.
$ enroot start --root my_container # apt update # apt install python3-dev # exit $ enroot export --output my_container.sqsh my_container
For more information about the capabilities of Enroot, we recommend checking the official documentation https://github.com/NVIDIA/enroot.
Reuse the custom image in your jobs
For reusing your custom Enroot container image, you just need to indicate that image in the --container-image
option when submitting jobs (interactive or batch ones) to the target system. For example (assuming you have already an allocation on dgx-1-p100
)
$ srun --pty --container-image='/my-path/my-container.sqsh' bash # will execute bash on a container created out of your custom Enroot container image