Local Node Filesystem

Every node provides a locally shared temporary directory /tmp  about 800G (shared) in size, provided by an enterprise grade local SSD drive. There is no quota enforced on this drive at the moment.

This directory is a private directory, it will only be seen by your Job.

Avoid using all of its space at once and allow other users to make use of the Local Node Filesystem as well.

Data retention policy

Data in /tmp will be deleted right after your Job terminates! Make sure that you copy back important files before your Job ends.

A typical Job using the Local Node Filesystem has at least three steps:

  1. Copy necessary data from GPFS to /tmp.
  2. Run your calculation there.
  3. Move results from the Node back to the GPFS.
Example (Script part only)
#!/usr/bin/env bash

#SBATCH options ...

# Step 1
TMP=/tmp
cp job.inp job.dat $TMP

# Step 2
### change to dir $TMP
pushd $TMP
srun your_application
### change back to the starting dir
popd

# Step 3
### move the results back, for example 'job_result.out' to your home directory ~/
mv $TMP/job_result.out ~/.

Care must be taken that SLURM logfiles are not copied to the Local Node Filesystem. This could lead to (in the worst case) Job crashes and will always be overwritten when copied back. Never use cp * $TMP !!

Handling Timelimit-situations for Jobs using the Local Node Filesystem.

If you are unsure how long your Job will take, it might run into the timelimit. Make sure you implement a mechanism to copy back important intermediate results in this case, because the private /tmp directory will be deleted right at the end (timeout or not) of a Job.