PRACE Course: HPC Code Optimisation Workshop 2022

LRZ PRACE

Learning Goals

Through a sequence of simple, guided examples of code modernisation, the attendees will develop awareness on features of multi and many-core architecture which are crucial for writing modern, portable and efficient applications.

A special focus will be dedicated to scalar and vector optimisations for the Intel® Xeon® Scalable processor, code-named Skylake, utilised in the SuperMUC-NG machine at LRZ.

The workshop interleaves lecture and practical sessions.

Preliminary Agenda

	Session
1st day morning (10:00-12:00)	Intro (Volker Weinberg) Intro to LRZ HPC Systems and Software Stack (Gerald Mathias, Nisarg Patel) Principles of optimization (Jonathan Coles)
1st day afternoon (13:00-16:00)	HPC Architecture, Vectorization Example code Data structures (Jonathan Coles)
2nd day morning (10:00-12:00)	Profiling: Code instrumentation, Roofline Model, Intel Advisor (Jonathan Coles)
2nd day afternoon (13:00-16:00)	Debuggers (Gerald Mathias) Additional Tools: valgrind and Cache simulators. (Josef Weidendorfer) I/O Considerations (Patrick Böhl)
3rd day morning (10:00-12:00)	LikWid (Carla Guillen/Thomas Gruber) HPC report (Carla Guillen)
3rd day afternoon (13:00-16:00)	Optimisation highlights by LRZ (CXS Group LRZ) Q&A

The workshop is a PRACE training event organised by LRZ in cooperation with NHR@FAU .

Lecturers

Dr. Patrick Böhl, Dr. Jonathan Coles, Dr. Gerald Mathias , Dr. Carla Guillen, Nisarg Patel, Dr. Josef Weidendorfer (LRZ)

Thomas Gruber (NHR@FAU)

Slides and Exercises

COW-Code.tar.gzhdf5_examples.tarCOW-Code2.tar.gz

Recommended Access Tools

Exercises will be done on the CooLMUC2 Cluster @ LRZ with 28-way Haswell-based nodes and FDR14 Infiniband interconnect
Please use your own laptop or PC with X11 support and an ssh client installed for the hands-on sessions.
- Under Windows
  - Install and run the Xming X11 Server for Windows: https://sourceforge.net/projects/xming/ and then install and run the terminal software putty: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
  - Alternatively, we recommend to install the comfortable tool MobaXterm (https://mobaxterm.mobatek.net/download-home-edition.html) which also includes an X11 client.
- Under macOS
  - Install X11 support for macOS XQuartz: https://www.xquartz.org/
- Under Linux
  - ssh and X11 support comes with all distributions

Login under Windows:

Start xming and after that PUTTY
Enter host name lxlogin1.lrz.de into the putty host field and click Open.
Accept & save host key [only first time]
Enter user name and password (provided by LRZ staff) into the opened console.

Login under Mac:

Install X11 support for MacOS XQuartz: https://www.xquartz.org/
Open Terminal
ssh -Y lxlogin1.lrz.de -l username
Use user name and password (provided by LRZ staff)

Login under Linux:

Open xterm
ssh -Y lxlogin1.lrz.de -l username
Use user name and password (provided by LRZ staff)

How to use the CoolMUC-2 System

Login Nodes:

Reservation is only valid during the workshop, for general usage on our Linux Cluster remove the "--reservation=hcow1s22"

Submit a job:
sbatch --reservation=hcow1s22 job.sh
List own jobs:
squeue -M cm2
Cancel jobs:
scancel -M cm2 jobid
Show reservations:
sinfo -M cm2 --reservation
Interactive Access:

salloc -M cm2 --time=00:30:00 --reservation=hcow1s22 --partition=cm2_std

Details: https://doku.lrz.de/display/PUBLIC/Running+parallel+jobs+on+the+Linux-Cluster
Examples: https://doku.lrz.de/display/PUBLIC/Example+parallel+job+scripts+on+the+Linux-Cluster
Resource limits: https://doku.lrz.de/display/PUBLIC/Resource+limits+for+parallel+jobs+on+Linux+Cluster

Example OpenMP Batch File

#!/bin/bash
#SBATCH -o /dss/dsshome1/0D/hpckurs99/test.%j.%N.out
#SBATCH -D/dss/dsshome1/0D/hpckurs99
#SBATCH -J test
#SBATCH --clusters=cm2
#SBATCH --partition=cm2_std
#SBATCH --nodes=1
#SBATCH --qos=unlimitnodes
#SBATCH --cpus-per-task=28
#SBATCH --get-user-env
#SBATCH --reservation=hcow1s22
#SBATCH --time=02:00:00
module load slurm_setup
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
echo hello, world

Intel Software Stack

The Intel software stack is automatically loaded at login. The Intel compilers are called icc (for C), icpc (for C++) and ifort (for Fortran). They behave similar to the GNU compiler suite (option –help shows an option summary). For reasonable optimisation including SIMD vectorisation, use options -O3 -xavx (you can use -O2 instead of -O3 and sometimes get better results, since the compiler will sometimes try be overly smart and undo many of your hand-coded optimizations).

By default, OpenMP directives in your code are ignored. Use the -qopenmp option to activate OpenMP.

Use mpiexec -n #tasks to run MPI programs. The compiler wrappers' names follow the usual mpicc, mpifort, mpiCC pattern.

Intel OneAPI

The most recent version of the Intel software stack "Intel OneAPI" can be loaded with
Intel OneAPI software stack

uid@cm2login1:~> module load intel-oneapi
intel-oneapi-mpi: using intel wrappers for mpicc, mpif77, etc

Loading intel-oneapi/2021.4
  Unloading conflict: intel-mpi/2019-intel intel/19.0.5 intel-mkl/2019
  Loading requirement: intel-oneapi-compilers/2021.4.0 intel-oneapi-mkl/2021
                       intel-oneapi-mpi/2021-intel intel-oneapi-itac/2021.4.0
uid@cm2login1:~> module list
Currently Loaded Modulefiles:
1) admin/1.0   2) tempdir/1.0   3) lrz/1.0   4) spack/21.1.1   5) intel-oneapi-compilers/2021.4.0
6) intel-oneapi-mkl/2021   7) intel-oneapi-mpi/2021-intel   8) intel-oneapi-itac/2021.4.0
9) intel-oneapi/2021.4
uid@cm2login1:~> module av intel-oneapi
-------------- /lrz/sys/spack/.oneapi_rebuild/modules/x86_64/linux-sles15-x86_64 ---------------
intel-oneapi-advisor/2021.4.0    intel-oneapi-ipp/2021.4.0    intel-oneapi-mkl/2021.3.0
intel-oneapi-ccl/2021.4.0        intel-oneapi-ippcp/2021.4.0 intel-oneapi-mkl/2021.4.0
intel-oneapi-clck/2021.4.0       intel-oneapi-itac/2021.4.0   intel-oneapi-mpi/2021-gcc
intel-oneapi-compilers/2021.4.0 intel-oneapi-mkl/2021        intel-oneapi-mpi/2021-intel
intel-oneapi-dal/2021.4.0        intel-oneapi-mkl/2021-gcc8   intel-oneapi-tbb/2021.4.0
intel-oneapi-dnn/2021.4.0        intel-oneapi-mkl/2021-seq    intel-oneapi-vpl/2021.6.0
intel-oneapi-dpcpp-ct/2021.4.0   intel-oneapi-mkl/2021.1.1    intel-oneapi-vtune/2021.7.1
intel-oneapi-inspector/2021.4.0 intel-oneapi-mkl/2021.2.0

Upon loading the main intel-oneapi module, the default modules intel, intel-mpi, and intel-mkl are unloaded and replaced by the intel-oneapi-* variants. Further intel-oneapi-xxx modules are available via the module command.

PRACE Survey

Please fill out the PRACE online survey under

tbd.

This helps us and PRACE to increase the quality of the courses, design the future training programme at LRZ and in Europe according to your needs and wishes, get future funding for training events.

Contents