Every HPC node of both LiCCA and ALCC has access to the same network filesystem This filesystem contains the following folders, which currently share the same performance characteristics: Group home directory Backup All content of Pro Tip: All data that can easily be recreated (e.g. temporary files, python evironments, etc.) should be stored in the User scratch directory (not part of the Backup). Once Project and Cluster access have been approved, default permissions as well as user and group ownerships are applied to the four directories listed above. Permissions and ownerships of existing files and folders in these directories remain untouched. These directories can only be accessed by the owner and nobody else (except the root user). Default permissions of newly created files and folders are These directories can (only) be accessed and modified by all group members. Files and directories created by one member can be aribrarily modified or removed by any other group member. Note that user created files and folders in group directories won't have ACL an entry for special DO NOT attempt to "fix" file and folder permissions in group directories. Especially DO NOT run any kind of recursive Due to the nature of these ACL on group home and scratch directories, all files are marked as executable, and the output of DO NOT make your home or scratch folder world writable (e.g. To grant readonly access for your home and/or scratch directory to a specific group: The IdM group of choice should contain as few people as possible, because all members of this group will have read access to your personal home or scratch space this way. Recommendation: the respective To grant readonly access for your home and/or scratch directory to a specific user: You cannot modify the ACL of home and scratch group/project directories. To get access to another group's home or scratch folder you have to apply for Access to the Project Membership. The GPFS filesystem is operated with quota enabled for the user and group directories in home and scratch. Users can check their current GPFS filesystem usage and quota situation on the login nodes with the command with output of the following form There are quota set on the used block storage and also on inode usage (number of files and directories). There are also quota set on the HPC-project-group directories in home and scratch. additionally shows the quota for all HPC-project-group directories where the user is a member. You can exceed the quota for some time (the grace time) up to a hard limit (your quota times three). The grace time (for block and inodes) is set to 30 days. There is a quota monitoring running, which will send you a one-time "warning", once you exceed any of your quota . You will get a second message ("critical") if the grace time is under one week. Please try to clean up your directories at this point at the latest. Open a ticket with our Service-desk, if this is a problem. After the end of the grace time, no further writes are possible!Parallel File System (GPFS)
Overview
/hpc/gpfs2
which is a shared ressource./hpc/gpfs2/home/u/$USER
/hpc/gpfs2/scratch/u/$USER
/hpc/gpfs2/home/g/$HPC-Projekt/
/hpc/gpfs2/scratch/g/$HPC-Projekt/
/hpc/gpfs2/home
is backed up once a day to the Tape Library of the Rechenzentrum. All important data (e.g. results of calculations, user maintained software, etc.) is recommended to be stored in User home or Group directories.Default Permissions and Ownerships for User and Group directories
User directories
0644
and 0755
, respectively, due to the default umask setting of 0022
. This does not mean that other cluster user may access your files, because no regular user can get past your personal home and scratch directories, which act as gatekeepers.Group/Project directories
group
and other
(everyone) permissions, therefore the last two mode bits (e.g. 700) or corresponding output of ls -l
(e.g. -rwx------
) is completely meaningless.chmod
in group folders (e.g. chown -R
), even if you know what you are doing, because it is not necessary at all and will allocate useless extra metadata for every single file and folder.ls
may show all files with green color. Again, no need to fix this.Granting Access to User and Group directories
User directories
chmod 777
). This is explicitly forbidden and users doing so will receive a formal warning.rzhpc-*
group of your project.Group/Project directories
Quota regulations and management
list-quota
(/usr/local/bin/list-quota
):list-quota
johndoe@licca001:~$ list-quota
user quota: johndoe
Block Limits | File
Filesystem Fileset type blocks quota limit in_doubt grace | files quota limit in_doubt grace
gpfs2 home USR 0 512G 1.5T 80M none | 5 2000000 6000000 38 none
gpfs2 scratch USR 0 1T 3T 0 none | 1 4000000 12000000 0 none
none
is stated under the column grace
, everything is fine,blocks
and for the number of files under files
,quota
and less than a hard limit
, the time until the corresponding resource expires is stated under the column grace
(for example: 28 days).list-quota -g
johndoe@licca001:~$ list-quota -g
user quota: johndoe
Block Limits | File
Filesystem Fileset type blocks quota limit in_doubt grace | files quota limit in_doubt grace
gpfs2 home USR 0 512G 1.5T 80M none | 5 2000000 6000000 38 none
gpfs2 scratch USR 0 1T 3T 0 none | 1 4000000 12000000 0 none
group home fileset: home.g.test
Block Limits | File
Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace
gpfs2 FILESET 0 2.5T 7.5T 0 none | 1 10000000 30000000 0 none
group scratch fileset: scratch.g.test
Block Limits | File
Filesystem type blocks quota limit in_doubt grace | files quota limit in_doubt grace
gpfs2 FILESET 0 5.039T 15.12T 0 none | 1 20000000 60000000 0 none
Every node provides a locally shared temporary directory This directory is a private directory, it will only be seen by your Job. Avoid using all of its space at once and allow other users to make use of the Local Node Filesystem as well. Data retention policy Data in A typical Job using the Local Node Filesystem has at least three steps: Care must be taken that SLURM logfiles are not copied to the Local Node Filesystem. This could lead to (in the worst case) Job crashes and will always be overwritten when copied back. Never use Handling Timelimit-situations for Jobs using the Local Node Filesystem. If you are unsure how long your Job will take, it might run into the timelimit. Make sure you implement a mechanism to copy back important intermediate results in this case, because the private Local Node Filesystem
/tmp
about 800G (shared) in size, provided by an enterprise grade local SSD drive. There is no quota enforced on this drive at the moment./tmp
will be deleted right after your Job terminates! Make sure that you copy back important files before your Job ends.cp * $TMP
!!/tmp
directory will be deleted right at the end (timeout or not) of a Job.
Every Job can make use of a local RAM disk located at This directory is a private directory, it will only be seen by your Job. Handling Timelimit-situations for Jobs using the RAM disk. If you are unsure how long your Job will take, it might run into the timelimit. Make sure you implement a mechanism to copy back important intermediate results in this case, because the private Do not submit Jobs with significantly more that 8G per CPU core on the RAM disk (tmpfs)
/dev/shm
, which has a significantly higher performance (both I/O operations per seconds as well as bandwidth) as the filesystem on the local SSD disk. The usage is similar to the local SSD storage (see above). Contrary to disk storage, RAM disk storage requirements have to be added to the requested amount of RAM. The maximum size of the RAM disk is limited to approx. 50% of the total amount of RAM per node, i.e. 500G for nodes of the epyc
and epyc-gpu
nodes, and 2T for epyc-mem
nodes.#SBATCH --mem=12G
of RAM. Failure to do so will result in your Job being terminated by the OOM (Out-Of-Memory) killer./dev/shm
directory will be deleted right at the end (timeout or not) of a Job.epyc
partition. Use the epyc-mem
partition for high memory applications instead.
The GPFS shows optimal performance with sequential read and write patterns (typically large files). Avoid random and high frequency access patterns (typically small files). Avoid the creation of large numbers of small files (>1000) in a single directory. Being a network filesystem there is always a small latency for every I/O operation involved during which your calculation remains idle. Since the GPFS is a shared ressource, the performance for all other users may vary and strongly depend on the filesystem load created by a single user either globally or within a single node. If you cannot avoid highly frequent I/O operations, it is almost always much more efficient to use the local node filesystem or the RAM disk (see below). In order to help you decide for the optimal storage for your use case we ran a couple of benchmarks.Performance