General

There are several different ways to use python on the HPC cluster, each of them with specific advantages, disadvantages, and levels of customizability. 

How to determine the current python version? Run the following command:

which python
- or -
which python3

Licenses

The conda ecosystem uses so called channels to distribute packages. When using anaconda either with our provided modules or for example by installing miniconda yourself (not recommended) with the default channel, please make sure that you comply with its current Terms of Service. If unsure, go with the conda-forge channel by default, e.g. using miniforge  or micromamba  modules from the start.

Don'ts

  • Don't install full Anaconda3 distributions yourself. This is explicitly discouraged as they requires unnecessarily large amounts of disk space and disk inodes, and will be needlessly backed up to tape if installed in a home directory.
  • Don't use the OS level python version (/usr/bin/python or /usr/bin/python3, shown by the which command as described above), as it is most likely not the right choice for your scientific workload, and rather old but stable. It may as well change during cluster level OS upgrade.
  • Don't do package installations by mixing conda and pip install commands. While this may sometimes work, it may cause a lot of trouble and hard-to-debug error cases.

Modules providing python

Anaconda Distribution

These modules are targeted primarily for beginners and/or simple use cases. The base environment of the anaconda modules contains a broad set of packages curated by Anaconda.org. The extendibility of the base environment is limited since it is write-protected (read-only) for each user.

List available anaconda versions
module av anaconda

It is recommended to always use the most recent version as old versions might get deprecated and removed at a later point. (Note, that you can always create your own environments which are independent from modules and their base environment, see below).

If you have no clue which python to use, use this!
module load anaconda

Intel® Distribution(s) for python

Intel maintains optimized (using Math Kernel Library, MKL) and improved distributions of python under the framework of the Intel® AI Analytics Toolkit which have been conveniently made available via modules.

List available anaconda versions
module av intelpython intel-aikit
  • Intel® Distribution for Python
  • Intel® Distribution of Modin
  • Intel® Optimization for PyTorch
  • Intel® Optimization for TensorFlow

Modules providing python package/environment managers

These module allow you to install your own private python environments that you can fully control yourself.

README

By default, all provided package manager modules store private python environments and the package cache in the scratch area of the GPFS filesystem, which (contrary to the home area of the GPFS filesystem) is not part of the daily filesystem backup. The rationale is simple: environments can be easily recreated while saving precious disk space on the backup system. The exact path is:

  • /hpc/gpfs2/scratch/u/$USER/.conda  for anaconda/miniforge
  • /hpc/gpfs2/scratch/u/$USER/micromamba  for micromamba

Avoid modifying your ~/.bashrc

Even though it might be a good idea and is taught in many tutorials to call conda init on your personal computer or workstation, it is explicitly discouraged in an HPC environment since it pollutes your ~/.bashrc and might even be activated by default. It is recommended to load python package managers via Lmod modules exclusively.

pip or conda?

Conda environments are self-contained (no external OS-level dependency beyond very basic ones like libc) environments including python and non-python packages. venv environments of pip may only contain python packages, which will not work when non-python dependencies are required. External dependencies will not be installed at the cluster level, as we try to keep thr number of OS-level packages on the compute node minimal. Please always use a conda-based package manager.

Anaconda and Miniforge

After loading any of these environments, the (base) environment will be activated automatically. While this environment is write-protected, you may simply use conda to create your own, independent environments.

about miniconda

It is not necessary to install Miniconda yourself because the module Anaconda already provides the same executable conda.

Creating a new private environment with a specifiy python version and other packages
module load anaconda # or: module load miniforge
conda create -n myenv -c <channel like conda-forge,intel,bioconda,etc> python=<version, i.e. 3.11> numpy scipy ...

Default channel

  • anaconda: anaconda
  • miniforge: conda-forge
Activate a named environment and install more packages
conda activate myenv
conda install -c <channel> mypackage

Micromamba

Unlike Anaconda or Miniforge, Micromamba does not require a base environment, as it is merely a single C++ executable. While the base environment still exists for compatibility reasons, it is completely empty by default and not recommended to be used at all. For this reason, it is also not loaded automatically. micromamba implements a subset of conda functionality sufficient for most usecases, and usually executes faster than conda. It also implements some commands like repoquery (not available using conda ) to effectively query package repositories and package dependencies. micromamba is also conveniently aliased as mm .

Creating a new private environment
module load micromamba
mm create -n myenv -c <channel like conda-forge,intel,bioconda,etc> python=<version, i.e. 3.11> numpy scipy ...
Activate a named environment and install more packages
mm activate myenv
mm install -c <channel> mypackage
Set a default channel for the current active environment
# may be either append or prepend
mm config append channels conda-forge --env
# set it globally (not recommended):
mm config append channels conda-forge
# next installation will use conda-forge channel by default
mm install mypackage
Run in a specific environment without activation
mm run -n myenv python3 myscript.py myargs ...


Install GPU packages on the login node

Some python packages are available with GPU support. These packages not only take significantly longer compile and build times and larger binaries and download. There is an ongoing effort to limit installing packages compiled for GPUs unnecessarily on CPU-only machines by default. This is accomplished by adding a run dependency __cuda (virtual package) that detects if the local machine has a GPU. However, this introduces challenges as the login nodes do not have GPUs and their compute counterparts with GPUs do not have internet access. In this case, a user can override the default setting via the environment variable CONDA_OVERRIDE_CUDA  to install GPU packages on the login node to be used later on the compute node.

Predend to be compatible to a certain CUDA version
# conda
CONDA_OVERRIDE_CUDA=12.2 conda install -c <channel> pytorch
# micromamba
CONDA_OVERRIDE_CUDA=12.2 mm install -c <channel> pytorch

Unfortunately, not all libraries behave the same way. Sometimes there are also special *-gpu and *-cpu versions available. Sometimes ( tensorflow  in conda-forge ), a GPU version will be installed by default, which is very outdated without CONDA_OVERRIDE_CUDA on the login node, and a CPU-version can only be installed using tensorflow-cpu .

Install MKL versions of numeric libraries

The anaconda channel uses MKL versions of all important numeric libraries already by default.

Install MKL version of BLAS should be enough, everything else will follow
# conda
conda install -c <channel> libblas=*=*mkl
# micromamba
mm install -c <channel> libblas=*=*mkl


Install MPI (mpi4py)

The recommended way to use python in combination mpi4py depends on the conda-channel you are using:

Channel is anaconda or main or default: This is one of the rare occurances, where it is best to use pip to install the package despite being a conda-managed environment. Reason: there is no current openmpi package in this channel that permits to use of an external OpenMPI installation.

Install mpi4py using pip (recommended only for anaconda/default/main channel)
module load anaconda
module load gcc openmpi/4.1 
conda activate env_mpi
pip --no-cache-dir install mpi4py

Channel conda-forge (Hint: do not use this if you are on anaconda channel already, since mixing anaconda and conda-forge channels can cause strange errors):

Install mpi4py using conda (conda-forge channel)
module load anaconda
module load gcc openmpi/4.1  
conda activate env_mpi
conda install -c conda-forge mpi4py openmpi=4.1.*=external_*

Make sure that the openmpi Lmod module matches the installed openmpi package in conda down to the minor version.

Both ways result in in an environment where python can (and should) be sarted via Slurm's srun launcher.

Python environments as Lmod modules for classes

The HPC team offers to create prebuilt conda environments for classes according to a package specification and which can be loaded using Lmod modules. Students will not be able to make modifications to these environment and this solution makes only sense if all students will need exactly the same environment. The benefit is to have a ready-to-use environment (time saver) which which needs to be installed only once (space saving).

Backing up environments

Backup an environment
# conda
conda env export -n myenv > ~/myenv-backup.yml
# micromamba
mm env export -n myenv > ~/myenv-backup.yml

Restoring environments

Restore an environment
# conda
conda env create -n mynewenv --file myenv-backup.yml
# micromamba
mm env create -n mynewenv --file myenv-backup.yml