Darshan
Darshan is a scalable HPC I/O characterization tool that is designed to capture an accurate picture of application I/O behavior with a minimum overhead. This includes properties such as patterns of access within files, number of I/O operations, size of operations, etc.
Darshan on LRZ platforms
Darshan can be used to trace MPI applications’ dynamic executables on SuperMUC-NG and the Linux Clusters. It is enabled to trace applications that use Intel MPI.
Enabling Darshan to trace MPI applications
You have to load the appropriate module depending on whether you use Intel-Compilers or GCC.
The availabe modules can be seen via
module av darshan-runtime
Here we will use GCC as an example:
module load darshan-runtime/3.3.1-gcc8-impi
The tracing is enabled by using the environment variable LD_PRELOAD:
LD_PRELOAD=$DARSHAN_LIBDIR/libdarshan.so
Darshan uses the environment variable $DARSHAN_LOG_DIR_PATH to specify where its logfiles will be written. This variable is set as default to $SCRATCH/.darshan-logs. It is not recommended to change this variable, especially do not to let it point to $HOME.
An example script for how to use Darshan SuperMUC-NG in your SLURM-Script is given below (the way how to load the modules and the use of LD_PRELOAD is identical on CoolMUC2).
#!/bin/bash #SBATCH -J io_test #SBATCH -A YOUR_PROJECT #SBATCH -D ./ #SBATCH -o ./%x-%j-%N.out #SBATCH -e ./%x-%j-%N.err #SBATCH --export=NONE #SBATCH --mail-user=YOUR_EMAIL@SOME.DOMAIN #SBATCH --mail-type=NONE #SBATCH --partition=test #SBATCH --nodes=4 #SBATCH --ntasks-per-node=48 #SBATCH --time=0:05:00 module load slurm_setup ### module unload intel module load gcc ### ### module unload intel-mpi module load intel-mpi/2019-gcc ### ### module load darshan-runtime/3.2.1-gcc8-impi ### ### mpiexec -env LD_PRELOAD=$DARSHAN_LIBDIR/libdarshan.so -env DARSHAN_LOGHINTS="" NAME_OF_YOUR_BINARY_WITH_IO ###
Unsetting the variable "DARSHAN_LOGHINTS" is necessary because of some kind of incompatibility with the settings of MPI-IO in Intel-MPI and Darshan. Otherwise your programm will likely end successfully but you job will hang and the Darshan log file is not created properly.
Extracting the I/O characterization
If the program finishes correctly a log file is located in:
$SCRATCH/.darshan-logs/<USERNAME>_<BINARY_NAME>_<SLURM_JOBID>_<DATE>_<UNIQUE_ID>_<TIMING>.darshan
Generation of a PDF summary
You can generate a PDF summary with graphs.
module load darshan-util darshan-job-summary.pl <YOUR_DARSHAN_FILE>.darshan
Analysis on the command line (works on SNG and CM2)
On the command line, you can analyse the log file using the utility darshan-parser which provides full I/O information of the performance and operations. It has several command line options:
:~> darshan-parser --help Usage: darshan-parser [options] <filename> --all : all sub-options are enabled --base : darshan log field data [default] --file : total file counts --file-list : per-file summaries --file-list-detailed : per-file summaries with additional detail --perf : derived perf data --total : aggregated darshan field data
If you want detail analysis of the I/O counters and the I/O performance in a text file you can use the following command.
darshan-parser <YOUR_DARSHAN_FILE>.darshan > <YOUR_DARSHAN_FILE>.txt
Documentation
Please refer to Darshan Web Site for more information about the meaning of I/O counters, other utilities of Darshan, and static tracing.