General Information

Description

TURBOMOLE is a collaborative, multi-national software development project aiming to provide highly efficient and stable computational tools for quantum chemical simulations of molecules, clusters, periodic systems, and solutions. The TURBOMOLE software suite is optimized for widely available, inexpensive, and resource-efficient hardware such as multi-core workstations and small computer clusters. TURBOMOLE specializes in electronic structure methods with outstanding accuracy-cost ratio, such as density functional theory including local hybrids and the random phase approximation (RPA), GW-Bethe–Salpeter methods, second-order Møller–Plesset theory, and explicitly correlated coupled-cluster methods. TURBOMOLE is based on Gaussian basis sets and has been pivotal for the development of many fast and low-scaling algorithms in the past three decades, such as integral-direct methods, fast multipole methods, the resolution-of-the-identity approximation, imaginary frequency integration, Laplace transform, and pair natural orbital methods, see https://www.turbomole.org/turbomole/.

Usage conditions and Licensing

TURBOMOLE may only be used by users or groups holding a license (development or commercial).


Running TURBOMOLE

Lmod modules

Starting with v7.8.1 there is a turbomole  module, which will load the commercial version by default. An error message will be displayed if access is denied. This module can also be used with own TURBOMOLE (developer) versions if the TURBODIR environment variable is already set then this directory will be used. Beware that this module does a lot of configuration under the hood and reads information from Slurm environment variables within a Slurm job. Make sure that you do the usual ml purge && ml turbomole cycle in your jobfile. When loaded outside a Slurm job, 4 CPU cores and SMP-mode are used.

SLURM Job template

The folloing Template is recommended for all TURBOMOLE users it esures that

  • all needed environment variables needed for parallelism are set properly (depending on SBATCH parameters).
  • all temporaray files are placed on a local ramdisk (huge speedups compared to GPFS) and necessary directories are created on all nodes.
  • all results are copied back after the calculation is done.
  • one hour before the Job runs into TIMEOUT, a stopfile is written, triggering TURBOMOLE to gracefully finish its current cycle and then shutdown.


Make sure to modify the SBATCH --mail-user section to your needs!

Job Template using Lmod module
#!/usr/bin/env bash
#SBATCH --job-name=tm
#SBATCH --partition=epyc
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=4G
#SBATCH --mail-type=END,INVALID_DEPEND,TIME_LIMIT
#SBATCH --mail-user=noreply@physik.uni-augsburg.de
#SBATCH --time=1-0
# send signal 1 hour before job times out
#SBATCH --signal=B:SIGUSR1@3600

# Turbomole distributed by cosmologic/Turbomole GmbH includes all needed libraries.
# Other Turbomole distributions might need an Intel environment (depending on how it was compiled): module load intel
# To use a custom Turbomole installation set TURBODIR before loading the module
# export TURBODIR=

# By default, TURBOTMPDIR is set to a Ramdisk within a Slurm Job
# To use local SSD for TURBOTMPDIR, set
# export TMPDIRMODE=SSD
# To force using GPFS for TURBOTMPDIR, set (heavily discouraged!)
# export TMPDIRMODE=GPFS

module purge
module load turbomole

function write_stop_and_wait() {
	# set stopfile
	touch ${TURBOTMPDIR}/stop
	echo -e "\nSTOPFILE has been written! Waiting for last CYCLE to finish..."
    wait
}
trap 'write_stop_and_wait' SIGUSR1

# only the main node needs the input files
echo -e "\nCopying files ..."
shopt -s extglob
# Attention is needed when the file extension of the SLURM-Logfile is changed from *.out via --output or --errror (!)
# These files should not be copied as this has lead to incomplete logfiles and race conditions in the past.
# $ALLNODES is set by the module and will run a command once per node in case of multi-node MPI mode.
$ALLNODES mkdir -p ${TURBOTMPDIR}
$ALLNODES cp -pv "${SLURM_SUBMIT_DIR}"/!(*.out) ${TURBOTMPDIR}
pushd ${TURBOTMPDIR}
echo

# Actual job goes here. It is essential to send the Job into background using & + wait!
echo -e "\nStarting Job ..."
jobex -level scf -ri -c 50 -energy 6 -gcart 3 &
wait

echo -e "\nCopying files back ..."
cp -rpuv ${TURBOTMPDIR}/* "${SLURM_SUBMIT_DIR}"

if [[ -f ${SLURM_SUBMIT_DIR}/stop ]]; then
	rm ${SLURM_SUBMIT_DIR}/stop
	# maybe requeue here...
else
	echo -e "\nAll done!"
fi
Deprecated Job Template
#!/usr/bin/env bash
#SBATCH --job-name=tm
#SBATCH --partition=epyc
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4G
#SBATCH --mail-type=END,INVALID_DEPEND,TIME_LIMIT
#SBATCH --mail-user=noreply@physik.uni-augsburg.de
#SBATCH --time=1-0

# Turbomole distributed by cosmologic includes all needed libraries.
# Other Turbomole distributions might need an Intel environment (depending on how it was compiled):
# module load intel

export TURBODIR=<INSERT YOUR TURBODIR HERE>

# Set Turbomole environment variables according to SBATCH parameters
if [[ ${SLURM_NNODES} -gt 1  || ( ${SLURM_CPUS_PER_TASK} -eq 1 && ${SLURM_NTASKS} -gt 1 ) ]] ; then
    export PARA_ARCH=MPI
    if [[ ${SLURM_CPUS_PER_TASK} -gt 1 ]] ; then
        echo -e "Using hybrid MPI/OpenMP over ${SLURM_NNODES} nodes with ${SLURM_CPUS_PER_TASK} threads each!\n"
        export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
    else
        echo -e "Using MPI over ${SLURM_NNODES} nodes with ${SLURM_NTASKS} tasks!\n"
        export PARNODES=${SLURM_NTASKS}
        export OMP_NUM_THREADS=1
    fi
else
    if [[ ${SLURM_CPUS_PER_TASK} -gt 1 ]] ; then
        echo -e "Using SMP(OpenMP) with ${SLURM_CPUS_PER_TASK} threads!\n"
        export PARA_ARCH=SMP
        export PARNODES=${SLURM_CPUS_PER_TASK}
    else
        echo -e "Using serial mode (1 CPU core)!\n"
    fi
fi

# Print some Job info
env | egrep "(SLURM_TASKS_PER_NODE|SLURM_JOB_NODELIST|SLURM_MEM|SLURM_JOBID|SLURM_JOB_PARTITION)" | sort

# Use local Temp dir to avoid slowdowns due to disk-i/o
export TURBOTMPDIR=/tmp

# Increase stack size
ulimit -s unlimited

# Load further Turbomole configurations...
source $TURBODIR/Config_turbo_env

# only the main node needs the input files
echo -e "\nCopying files ..."
shopt -s extglob
# Attention is needed when the file extension of the SLURM-Logfile is changed from *.out via --output or --errror (!)
# These files should not be copied as this has lead to incomplete logfiles and race conditions in the past
cp -pv "${SLURM_SUBMIT_DIR}"/!(*.out) ${TURBOTMPDIR}
cd ${TURBOTMPDIR}
echo

# Long running parts should be prefixed with: timeout $((SLURM_JOB_END_TIME - $(date +%s) - 900))
# This is to ensure that job will run into a timeout 15 minutes (900s) before the Job will be killed
# by SLURM and there is enough time to copy back files and clean everything up
# Actual job goes here:
timeout $((SLURM_JOB_END_TIME - $(date +%s) - 900)) jobex -level scf -ri -c 50 -energy 6 -gcart 3

echo -e "\nCopying files back ..."
cp -rpuv ${TURBOTMPDIR}/* "${SLURM_SUBMIT_DIR}"

echo -e "\nAll done!"

Serial calculations

The above template is an example for a serial calculation already. Serial calculations have the following settings:

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1

OpenMP parallelization

Important!

SMP (OpenMP) is the most common and most efficient mode of operation for TURBOMOLE, this is probably what should be used most of the time.
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4

MPI parallelization

MPI-Parallel Jobs, generally less efficient than SMP(OpenMP)-Parallel jobs. Use this only when you need more CPU-Cores than a single node can provide, which is rarely the case. Note that --ntasks-per-node is important here, since without this setting, SLURM may distribute the job arbitrarily, which TURBOMOLE often does not like (Coredumps, Segfaults).

# Example using 8 CPU-Cores spanned over 4 (=32/8) nodes:
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=1

Hybrid OpenMP/MPI parallelization

There are some executables (dscf, grad, aoforce, ricc2, escf or pnoccsd) which allow this type of parallelization. So make sure your job supports this! 

# Launch 4 MPI Tasks, each on a seperate node, and run 28 Threads on each node.
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=28

GPU Support

Since Turbomole 7.7, GPU support is available for the executables escf , egrad , aoforce , mpshift , ridft  and rdgrad  but limited to a maximum of 1 GPU, SMP parallelization only. The module will fail to load in other cases.

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --gpus=a100:1

Support

If you have any problems with TURBOMOLE please contact the team of IT-Physik (preferred) or the HPC-Servicedesk.

Also if you have improvements to this documentation that other users can profit from, please reach out!