The following section only cover the basic usage of the most important Slurm commands. For more information consult the respectice man-pages (e.g. man sbatch ) or the official Slurm documentation.
Gathering Information
Information regarding the state of all nodes can be obtained via sinfo which is available on the command line by default
sinfo
typical output:
$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test up 2:00:00 1 idle licca001 epyc* up 7-00:00:00 41 idle licca[002-042] epyc-mem up 7-00:00:00 4 idle licca[043-046] epyc-gpu up 3-00:00:00 8 idle licca[047-054] epyc-gpu-sxm up 3-00:00:00 1 idle licca055
Listing Jobs
squeue
squeue -u $USER
Jobstatus and Reasons for pending Jobs
Creating and Submitting a Job
Jobs are typically submitted using the sbatch command. At first, a batch script has to be created, typically in the work directory where the Job will be running later on:
#!/usr/bin/env bash
# Use a job name that describes your job (not too long)
#SBATCH --job-name=test
# Select a partiton (epyc, epyc-mem or epyc-gpu)
#SBATCH --partition=epyc
# Request memory (default 512M)
#SBATCH --mem=1G
# Events when a mail is sent
#SBATCH --mail-type=END,INVALID_DEPEND
# Send mail to this address. Fill in valid mail address or delete this line.
#SBATCH --mail-user=noreply@uni-augsburg.de
# Timelimit 1 day (max 7 days)
#SBATCH --time=1-0
# Always assume your application might be multithreaded.
# Safeguard to limit number of threads to number of requested CPU cores.
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
# Run your application
srun my_program
The Job then can be submitted with sbatch .
sbatch myjob.sl
If the job was successfully submitted you will receive a JobID.
$ sbatch myjob.sl Submitted batch job 360035
In case of syntax errors or invalid sbatch options a descriptive error message will be issued.
Stopping/Canceling a Job
scancel 360035
Other examples:
Updating properties of pending Jobs
The scontrol utility can be used to modify submitted Jobs which have not been started yet. Already running Jobs cannot be modified anymore.
# Modify the Timelimit scontrol update jobid=$jobid TimeLimit=3-0
For a list of updatable fields see here.