General Information

Quantum ESPRESSO (QE) is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.

Discover available versions of Quantum ESPRESSO

ml av qe

Running Quantum ESPRESSO

QE is compiled using an Intel Classic compiler, linked against the libraries

wannier90
HDF5
libXC

and can be parallelized using OpenMP, MPI or a combination thereof.

Note that OpenMP-Threading tends to performs worse than MPI for QE, so rather stay on the heavy MPI side and use additional OpenMP-threading very carefully, as it can easily run very inefficiently, while very rarely giving a minute benefit, see below.

Tipps & Tricks

Familiarize with QE's levels of parallelization.
When the memory of a single node (while using all available CPU cores) is a concern, then try increasing the number of nodes. QE will automatically distribute calculations and datastructures across all tasks, so the memory per task decreases with increasing tasks.
Pay attention to Notes and Warnings regarding parallelization at the beginning of QE output. If the number of tasks or nodes is not appropriate, QE will spill out warnings. Do not run such ill-parallelized calculations! Typical warnings are:
- WARNING: too many processors for an effective parallelization!
- suboptimal parallelization: some nodes have no k-points!

Sample Slurm Job

For QE the recommendation is to request relative ressources ( *-per-*= ) and first increase --tasks-per-node up to 128 before increasing --nodes to avoid inter-node communication, which is slower than intra-node communication.

Sample sbatch file "qe.sl"

#!/usr/bin/env bash
#SBATCH --job-name=qe
#SBATCH --partition=epyc
#SBATCH --nodes=1
#SBATCH --tasks-per-node=16
#SBATCH --cpus-per-task=1 # recommended
#SBATCH --mem-per-cpu=4G
#SBATCH --mail-type=END,INVALID_DEPEND,TIME_LIMIT
# replace the email with your personal one in order to receive mail notifications:
#SBATCH --mail-user=noreply@physik.uni-augsburg.de
#SBATCH --time=1-0

ml purge
ml load qe/7.3

srun pw.x < job.in > job-${SLURM_JOB_ID}.out

Running Quantum ESPRESSO on GPU

A seperate module (look out for the red built for GPU(g)flag when using ml av qe ) provides a version of QE compiled with NVHPC-compiler which allows running most (but not all) functionalities of QE on GPU.

Note on GPU efficiency vs CPU

According to QE developers, running QE on GPU can reduce the computational time by a factor of 2-3. Therefore, don't expect too much benefit. According to our own measurements (see below) one A100 GPU is about three times faster than single 64-core CPU, or one A100 equals 1.5 CPU nodes. In contrast, scaling to more GPUs has shown to be inefficient (waste of ressources), while scaling to more CPUs works much better.

Not all parts of QE are parallelized via GPUs. Running not-yet-implemented parts of QE on GPU-nodes is not allowed. Please check GPU-efficiency after running small test calculations.

Sample GPU sbatch file "qegpu.sl"

#!/usr/bin/env bash
#SBATCH --job-name=qe
#SBATCH --partition=epyc-gpu
#SBATCH --nodes=1
#SBATCH --tasks-per-node=2 # (between 1-3 for epyc-gpu nodes)
#SBATCH --cpus-per-task=1 # recommended
#SBATCH --gpus-per-task=1 # recommended
#SBATCH --mem-per-cpu=4G
#SBATCH --mail-type=END,INVALID_DEPEND,TIME_LIMIT
# replace the email with your personal one in order to receive mail notifications:
#SBATCH --mail-user=noreply@physik.uni-augsburg.de
#SBATCH --time=1-0

ml purge
ml load qe/7.3-ompi4.1-nvhpc24.1

srun pw.x < job.in > job-${SLURM_JOB_ID}.out

Pseudopotentials

All potentials of the have been made available via the environment variable PSEUDO_DIR as part of the modules. Serverl different versions of SSSP potentials are located in their respective $SUBDIR folders

Name	$SUBDIR
SSSP PBE Efficiency v1.3.0	SSSP_1.3.0_PBE_efficiency
SSSP PBE Precision v1.3.0	SSSP_1.3.0_PBE_precision
SSSP PBEsol Efficiency v1.3.0	SSSP_1.3.0_PBEsol_efficiency
SSSP PBEsol Precision v1.3.0	SSSP_1.3.0_PBEsol_precision

Benchmarks

AUSURF112 (CPU, small size test case)

Nodes	Tasks-per-Node	OMP-Threads	Time	Efficiency*	Comment
1	1	1	1h 9m	234%
1	2	1	35m57.86s	224%
1	4	1	19m22.61s	208%
1	8	1	10m57.45s	184%
1	16	1	6m21.78s	158%
1	32	1	3m43.01s	136%
1	64	1	2m31.15s	100%
1	1	64	10m5.11s	25	Don't do it!
1	128	1	1m26.30s	88
2	128	1	1m 2.14s	61

*normalized to a full socket (64 cores). Here the unoccupied cores were empty and CPU clock was higher, almost 200% for a serial calculation. Since this is hard to seperate, the reference (100%) is a full socket.

AUSURF112 (GPU, small size test case)

Nodes	Tasks-per-Node	GPU-sbatch-Line	Time	Efficiency	Comment
1	1	--gpus-per-task=1	30.11s	100%
1	1	--gpus-per-task=2	29.10s	<50%	Not worth it.
1	2	--gpus-per-task=1	29.55s	<50%	Not worth it.
1	2	--gpus-per-node=2	29.52s	<50%	Not worth it.

GRIR443 (CPU, medium size test case)

Nodes	Tasks-per-Node	OMP-Threads	Time	Efficiency	Comment
1	64	1	44m33.31s	100%
1	32	2	42m22.94s	105%
1	16	4	47m24.97s	94%
1	128	1	25m15.45s	88%
1	64	2	26m35.25s	84%
1	32	4	27m53.03s	80%
2	128	1	12m25.88s	90%
2	64	2	13m11.59s	84%
2	32	4	41m 8.73s	27%	no WARNING but very inefficient
4	128	1	6m 5.27s	91%
4	64	2	6m40.83s	83%
4	32	4	23m 7.58s	24%	no WARNING but very inefficient
8	128	1	3m35.34s	78%
8	64	2	3m21.95s	83%
8	32	4	11m16.31s	25%	no WARNING but very inefficient
16	128	1	8m17.43s	17%	WARNINGS

GRIR443 (GPU, medium size test case)

Nodes	Tasks-per-Node	GPU-sbatch-Line	Time	Efficiency	Comment
1	1	--gpus-per-task=1	-	-	Out of GPU-Memory
1	2	--gpus-per-task=1	8m18.45s	100%
1	1	--gpus-per-task=2	-	-	Out of GPU-Memory
1	2	--gpus-per-node=2	8m13.33s	101%	Each MPI Task will see 2 GPUs, hardly a benefit.
2	2	--gpus-per-task=1	5m32.41s	75%	Not worth it.
4	2	--gpus-per-task=1	4m43.30s	44%	Not worth it.

Support

If you have any problems with Quantum ESPRESSO please contact the team of IT-Physik (preferred) or the HPC-Servicedesk.

Bereichsverknüpfungen

Seitenhierarchie