The following section only cover the basic usage of the most important Slurm commands. For more information consult the respectice man-pages (e.g. man sbatch
) or the official Slurm documentation.
Gathering Information
Information regarding the state of all nodes can be obtained via sinfo
which is available on the command line by default
sinfo
typical output:
$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST test up 2:00:00 1 idle licca001 epyc* up 7-00:00:00 41 idle licca[002-042] epyc-mem up 7-00:00:00 4 idle licca[043-046] epyc-gpu up 3-00:00:00 8 idle licca[047-054] epyc-gpu-sxm up 3-00:00:00 1 idle licca055
Listing Jobs
squeue
squeue -u $USER
Jobstatus and Reasons for pending Jobs
Creating and Submitting a Job
Jobs are typically submitted using the sbatch
command. At first, a batch script has to be created, typically in the work directory where the Job will be running later on:
#!/usr/bin/env bash # Use a job name that describes your job (not too long) #SBATCH --job-name=test # Select a partiton (epyc, epyc-mem or epyc-gpu) #SBATCH --partition=epyc # Request memory (default 512M) #SBATCH --mem=1G # Events when a mail is sent #SBATCH --mail-type=END,INVALID_DEPEND # Send mail to this address. Fill in valid mail address or delete this line. #SBATCH --mail-user=noreply@uni-augsburg.de # Timelimit 1 day (max 7 days) #SBATCH --time=1-0 # Always assume your application might be multithreaded. # Safeguard to limit number of threads to number of requested CPU cores. export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1} # Run your application srun my_program
The Job then can be submitted with sbatch
.
sbatch myjob.sl
If the job was successfully submitted you will receive a JobID.
$ sbatch myjob.sl Submitted batch job 360035
In case of syntax errors or invalid sbatch options a descriptive error message will be issued.
Stopping/Canceling a Job
scancel 360035
Other examples:
Updating properties of pending Jobs
The scontrol
utility can be used to modify submitted Jobs which have not been started yet. Already running Jobs cannot be modified anymore.
# Modify the Timelimit scontrol update jobid=$jobid TimeLimit=3-0
For a list of updatable fields see here.