General Information

Quantum ESPRESSO (QE) is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.

Discover available versions of Quantum ESPRESSO

ml av qe

Running Quantum ESPRESSO

QE is compiled using an Intel Classic compiler, linked against the libraries

  • wannier90
  • HDF5
  • libXC

and can be parallelized using OpenMP, MPI or a combination thereof.

Note that OpenMP-Threading tends to performs worse than MPI for QE, so rather stay on the heavy MPI side and use additional OpenMP-threading very carefully, as it can easily run very inefficiently, while very rarely giving a minute benefit, see below. 

Tipps & Tricks

  • Familiarize with QE's levels of parallelization.
  • When the memory of a single node (while using all available CPU cores) is a concern, then try increasing the number of nodes. QE will automatically distribute calculations and datastructures across all tasks, so the memory per task decreases with increasing tasks.
  • Pay attention to Notes and Warnings regarding parallelization at the beginning of QE output. If the number of tasks or nodes is not appropriate, QE will spill out warnings. Do not run such ill-parallelized calculations! Typical warnings are:
    • WARNING: too many processors for an effective parallelization!
    • suboptimal parallelization: some nodes have no k-points!

Sample Slurm Job

For QE the recommendation is to request relative ressources ( *-per-*= ) and first increase --tasks-per-node up to 128 before increasing --nodes to avoid inter-node communication, which is slower than intra-node communication. 

Sample sbatch file "qe.sl"
#!/usr/bin/env bash
#SBATCH --job-name=qe
#SBATCH --partition=epyc
#SBATCH --nodes=1
#SBATCH --tasks-per-node=16
#SBATCH --cpus-per-task=1 # recommended
#SBATCH --mem-per-cpu=4G
#SBATCH --mail-type=END,INVALID_DEPEND,TIME_LIMIT
# replace the email with your personal one in order to receive mail notifications:
#SBATCH --mail-user=noreply@physik.uni-augsburg.de
#SBATCH --time=1-0

ml purge
ml load qe/7.3

srun pw.x < job.in > job-${SLURM_JOB_ID}.out


Running Quantum ESPRESSO on GPU

A seperate module (look out for the red built for GPU(g)flag when using ml av qe ) provides a version of QE compiled with NVHPC-compiler which allows running most (but not all) functionalities of QE on GPU.

Note on GPU efficiency vs CPU

According to QE developers, running QE on GPU can reduce the computational time by a factor of 2-3. Therefore, don't expect too much benefit. According to our own measurements (see below) one A100 GPU is about three times faster than single 64-core CPU, or one A100 equals 1.5 CPU nodes. In contrast, scaling to more GPUs has shown to be inefficient (waste of ressources), while scaling to more CPUs works much better.

Not all parts of QE are parallelized via GPUs. Running not-yet-implemented parts of QE on GPU-nodes is not allowed. Please check GPU-efficiency after running small test calculations.


Sample GPU sbatch file "qegpu.sl"
#!/usr/bin/env bash
#SBATCH --job-name=qe
#SBATCH --partition=epyc-gpu
#SBATCH --nodes=1
#SBATCH --tasks-per-node=2 # (between 1-3 for epyc-gpu nodes)
#SBATCH --cpus-per-task=1 # recommended
#SBATCH --gpus-per-task=1 # recommended
#SBATCH --mem-per-cpu=4G
#SBATCH --mail-type=END,INVALID_DEPEND,TIME_LIMIT
# replace the email with your personal one in order to receive mail notifications:
#SBATCH --mail-user=noreply@physik.uni-augsburg.de
#SBATCH --time=1-0

ml purge
ml load qe/7.3-ompi4.1-nvhpc24.1

srun pw.x < job.in > job-${SLURM_JOB_ID}.out

Pseudopotentials

All potentials of the have been made available via the environment variable PSEUDO_DIR as part of the modules. Serverl different versions of SSSP potentials are located in their respective $SUBDIR folders

Name$SUBDIR
SSSP PBE Efficiency v1.3.0

SSSP_1.3.0_PBE_efficiency

SSSP PBE Precision v1.3.0

SSSP_1.3.0_PBE_precision

SSSP PBEsol Efficiency v1.3.0

SSSP_1.3.0_PBEsol_efficiency

SSSP PBEsol Precision v1.3.0SSSP_1.3.0_PBEsol_precision

Benchmarks

AUSURF112 (CPU, small size test case)

NodesTasks-per-NodeOMP-ThreadsTimeEfficiency*Comment
1111h 9m234%
12135m57.86s224%
14119m22.61s208%
18110m57.45s184%
11616m21.78s158%
13213m43.01s136%
16412m31.15s100%
116410m5.11s25Don't do it!
112811m26.30s88
212811m 2.14s61


*normalized to a full socket (64 cores). Here the unoccupied cores were empty and CPU clock was higher, almost 200% for a serial calculation. Since this is hard to seperate, the reference (100%) is a full socket.

AUSURF112 (GPU, small size test case)

NodesTasks-per-NodeGPU-sbatch-LineTimeEfficiencyComment
11--gpus-per-task=130.11s100%
11--gpus-per-task=229.10s<50%Not worth it.
12--gpus-per-task=129.55s<50%Not worth it.
12--gpus-per-node=229.52s<50%Not worth it.

GRIR443 (CPU, medium size test case)

NodesTasks-per-NodeOMP-ThreadsTimeEfficiencyComment
164144m33.31s100%
132242m22.94s105%
116447m24.97s94%
1128125m15.45s88%
164226m35.25s84%
132427m53.03s80%
2128112m25.88s90%
264213m11.59s84%
232441m 8.73s27%no WARNING but very inefficient
412816m 5.27s91%
46426m40.83s83%
432423m 7.58s24%no WARNING but very inefficient
812813m35.34s78%
86423m21.95s83%
832411m16.31s25%no WARNING but very inefficient
1612818m17.43s17%

WARNINGS

GRIR443 (GPU, medium size test case)

NodesTasks-per-NodeGPU-sbatch-LineTimeEfficiencyComment
11--gpus-per-task=1--Out of GPU-Memory
12--gpus-per-task=18m18.45s100%
11--gpus-per-task=2--Out of GPU-Memory
12--gpus-per-node=28m13.33s101%

Each MPI Task will see 2 GPUs, hardly a benefit.

22--gpus-per-task=15m32.41s75%Not worth it.
42--gpus-per-task=14m43.30s44%Not worth it.

Support

If you have any problems with Quantum ESPRESSO please contact the team of IT-Physik (preferred) or the HPC-Servicedesk.


  • Keine Stichwörter