FAQ: Conda and Python Virtual Environment on LRZ HPC Clusters

Conda

Setup/Recommendations

Please check for available conda modules (module av anaconda or module av miniconda). Then load it, and step on.

> module load anaconda3              # or "module load miniconda3"
> conda init bash                    # or choose the shell you use

This needs to be done only once. It creates a section in your ~/.bashrc tagged via

# >>> conda initialize >>>
...
# <<< conda initialize <<<
We recommend to place this section into a separate file such as ~/.conda_init or so, and place only a source ~/.conda_init into the ~/.bashrc, or, better, into the ~/.profile.

Background of this proposal is two-fold. 1) Cluttering the ~/.bashrc can have strange side effects ... and mistakes can exclude you from a successful login. 2) The ~/.bashrc is not sourced automatically in Slurm jobs. This is to avoid possible side-effects there inside your job. In order to activate conda in a Slurm job can then but easily accomplished via

source ~/.conda_init

inside your Slurm script ... without a possible hassle with side-effects from the ~/.bashrc.

Usage

Once this setup is done, you can normally login, with conda being active then already. This is recognized on the prompt (base).

(base) <userID>@<host:~>

Now, you can create and modify or remove environment according to your needs. For instance, a simple python environment with maybe a most current version not provided by LRZ can be done as follows (please always check also the Intel documentation).

Example conda environment session
(base) user@host:~> conda create -n py3.11 -c https://software.repos.intel.com/python/conda/ python=3.11
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.10.3
  latest version: 23.5.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /dss/dsshome1/00/di49zop/.conda/envs/py3.11

  added / updated specs:
    - python=3.11


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pip-22.2.2                 |  py311h06a4308_0         3.4 MB  anaconda
    python-3.11.0              |       h7a1cb2a_2        34.3 MB  anaconda
    setuptools-67.8.0          |  py311h06a4308_0         1.4 MB
    ------------------------------------------------------------
                                           Total:        39.1 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      anaconda/linux-64::_libgcc_mutex-0.1-main
  _openmp_mutex      anaconda/linux-64::_openmp_mutex-5.1-1_gnu
  bzip2              anaconda/linux-64::bzip2-1.0.8-h7b6447c_0
  ca-certificates    anaconda/linux-64::ca-certificates-2023.01.10-h06a4308_0
  ld_impl_linux-64   anaconda/linux-64::ld_impl_linux-64-2.38-h1181459_1
  libffi             anaconda/linux-64::libffi-3.4.2-h6a678d5_6
  libgcc-ng          anaconda/linux-64::libgcc-ng-11.2.0-h1234567_1
  libgomp            anaconda/linux-64::libgomp-11.2.0-h1234567_1
  libstdcxx-ng       anaconda/linux-64::libstdcxx-ng-11.2.0-h1234567_1
  libuuid            anaconda/linux-64::libuuid-1.41.5-h5eee18b_0
  ncurses            anaconda/linux-64::ncurses-6.4-h6a678d5_0
  openssl            anaconda/linux-64::openssl-1.1.1s-h7f8727e_0
  pip                anaconda/linux-64::pip-22.2.2-py311h06a4308_0
  python             anaconda/linux-64::python-3.11.0-h7a1cb2a_2
  readline           anaconda/linux-64::readline-8.2-h5eee18b_0
  setuptools         pkgs/main/linux-64::setuptools-67.8.0-py311h06a4308_0
  sqlite             anaconda/linux-64::sqlite-3.40.1-h5082296_0
  tk                 anaconda/linux-64::tk-8.6.12-h1ccaba5_0
  tzdata             anaconda/noarch::tzdata-2022a-hda174b7_0
  wheel              anaconda/noarch::wheel-0.37.1-pyhd3eb1b0_0
  xz                 anaconda/linux-64::xz-5.2.10-h5eee18b_1
  zlib               anaconda/linux-64::zlib-1.2.13-h5eee18b_0


Proceed ([y]/n)? y


Downloading and Extracting Packages
pip-22.2.2           | 3.4 MB    | ######################################################################################################### | 100% 
python-3.11.0        | 34.3 MB   | ######################################################################################################### | 100% 
setuptools-67.8.0    | 1.4 MB    | ######################################################################################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate py3.11
#
# To deactivate an active environment, use
#
#     $ conda deactivate  


(base) user@host:~> conda env list
# conda environments:
#
py3.11                   /home/user/.conda/envs/py3.11
base                  *  /dss/dsshome1/lrz/sys/spack/release/22.2.1/opt/x86_64/miniconda3/4.10.3-gcc-3vesgdq

(base) user@host:~> conda activate py3.11

(py3.11) user@host:~> python --version
Python 3.11.0

(py3.11) user@host:~> which python
/home/user/.conda/envs/py3.11/bin/python

(py3.11) user@host:~> conda deactivate                              # deactivate your environment

(base) user@host:~> conda env remove -n py3.11                      # to remove the environment again

Remove all packages in environment /home/user/.conda/envs/py3.11:

(base) user@host:~> 

Working with HPC dedicated channels 

Python for HPC comes with dedicated packages or channel whose efficiency is optimized for the specific hardware. The performance gain obtained by these packages may increase by even as much as an order of magnitude, especially for applications making use of numerical libraries (numpy/scipy) or parallel communication (mpi4py) or yet still ML/AI.

Using conda without taking advantage of these is thus largely discouraged on LRZ machines.


Except the GPU cloud, at the tme of writing most LRZ machines feature Intel hardware, thus the dedicated packages are provided by the intel channel. The same build, based on the Intel Distribution for Python, is provided also in the standard python modules (see below).
In order to add the Intel channel to your conda environment, just right after conda init  type: 

~/.condarc
conda config --add channels https://software.repos.intel.com/python/conda/

Later (see Usage ), when creating your environments, or when installing your packages, you ought to specify the Intel channel as preference, e.g. by:

~/.condarc
conda install -c https://software.repos.intel.com/python/conda numpy scipy sympy mpi4py matplotlib

You can place the preference of channels into your ~/.condarc file – with the channels in order of preference. For example

~/.condarc
channels:
  - https://software.repos.intel.com/python/conda
  - conda-forge
  - bioconda
  - defaults
report_errors: false

Take note that drawing commonly from different channels may cause inconsistencies (less is more, usually).

Within the conda environment activated, you can install additional packages via conda install ... .

Try to avoid combining conda and pip, as the conda developers discourage from that.

If you are trying to reproduce a specific build from your workstation or other system, changing conda channel may alter package versioning or specific dependencies. 
It this happens please report the occurrence through our service desk, as dedicated support for such troubleshooting is available. 

Intel conda channel was removed by Intel. A workaround exists.

Python Virtual Environment

First, check for currently available python modules via module av python. Then load a python module and step on. For example,

Python Virtual Environment Example
> module av python
------ /lrz/sys/spack/release/22.2.1/modules/x86_64/linux-sles15-x86_64 -------
python/3.7.11-base      python/3.8.11-base      
python/3.7.11-extended  python/3.8.11-extended  

> module load python/3.8.11-base                         # whatever you want

> which python
/lrz/sys/spack/release/22.2.1/views/python/3.8.11-base/bin/python

> python -m venv my_py_env/

> source my_py_env/bin/activate
(my_py_env)> which python
/home/user/bin/python

(my_py_env)> pip list
Package    Version
---------- -------
pip        21.1.1
setuptools 56.0.0
WARNING: You are using pip version 21.1.1; however, version 23.2 is available.
You should consider upgrading via the '/home/user/my_py_env/bin/python -m pip install --upgrade pip' command.

(my_py_env)> python -m pip install --upgrade pip
...

(my_py_env)> pip list
Package    Version
---------- -------
pip        23.2
setuptools 56.0.0

(my_py_env)> pip install numpy scipy
...

(my_py_env)> pip list
Package    Version
---------- -------
numpy      1.24.4
pip        23.2
scipy      1.10.1
setuptools 56.0.0

(my_py_env)> python
Python 3.8.11 (default, Jan 26 2022, 17:51:41) 
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import scipy                                                 # noticeable ... no errors
>>> 
...                                                              # and so on

(my_py_env)> deactivate                                          # deactivate python virtual environment>

For removing such a python virtual environment, only the directory of this environment needs to be deleted.

PIPENV

There is another alternative - pipenv. Installation is usually straightforward:

> pip install --user --upgrade pipenv

After installation, pipenv is available. (Possibly add manually ~/.local/bin to your PATH environment variable!)

Usage is rather simple (check --help option). For example,

> module tmp && cd tmp
~/tmp > pipenv install cmake
~/tmp > cmake --version
cmake version 3.10.2
~/tmp > pipenv run bash
~/tmp > cmake --version
cmake version 3.28.1

(With Ctrl+D you can leave the environment again.) This could be achieved also by directly installing cmake via pip (much as pipenv above). But the clue is that with this environment, you don't clutter your native home environment. One can have many pipenv environments in parallel.

Please be aware that also this environment concept is bound to the python used. So, better use our software stack provided python.

One more example:

> python -c 'import matplotlib'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named matplotlib

> cd tmp
~/tmp > pipenv install matplotlib
~/tmp > pipenv run python -c 'import matplotlib'      # no error thrown now!


Troubleshooting

SuperMUC-NG has no outgoing Internet Access

FAQ: Installing your own applications on SuperMUG-NG

Using Conda/Python with MPI on SuperMUC-NG (problem with EAR)

Energy Aware Runtime