The following section only cover the basic usage of the most important Slurm commands. For more information consult the respectice man-pages (e.g. man sbatch ) or the official Slurm documentation.


Gathering Information

Information regarding the state of all nodes can be obtained via sinfo which is available on the command line by default

sinfo

typical output:

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
test         up    2:00:00      1   idle licca001
epyc*        up 7-00:00:00     41   idle licca[002-042]
epyc-mem     up 7-00:00:00      4   idle licca[043-046]
epyc-gpu     up 3-00:00:00      8   idle licca[047-054]
epyc-gpu-sxm up 3-00:00:00      1   idle licca055

Listing Jobs

List all running and pending jobs
squeue
List own running and pending jobs
squeue -u $USER

Jobstatus and Reasons for pending Jobs

squeue also shows the Job Status (column ST , i.e. R for Running and PD for Pending, for more status values see here) and Reasons why a Job is still pending (column NODELIST(REASON) like (Ressources) , (Priority) ,(PartitionTimeLimit), for more possible reasons see here).

Creating and Submitting a Job

Jobs are typically submitted using the sbatch command. At first, a batch script has to be created, typically in the work directory where the Job will be running later on:

myjob.sl
#!/usr/bin/env bash

# Use a job name that describes your job (not too long)
#SBATCH --job-name=test
# Select a partiton (epyc, epyc-mem or epyc-gpu)
#SBATCH --partition=epyc
# Request memory (default 512M)
#SBATCH --mem=1G
# Events when a mail is sent
#SBATCH --mail-type=END,INVALID_DEPEND
# Send mail to this address. Fill in valid mail address or delete this line.
#SBATCH --mail-user=<e-mail address>
# Timelimit 1 day (max 7 days)
#SBATCH --time=1-0

# Always assume your application might be multithreaded. 
# Safeguard to limit number of threads to number of requested CPU cores.
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}

# Run your application
srun my_program

The Job then can be submitted with sbatch .

Submit the Job
sbatch myjob.sl

If the job was successfully submitted you will receive a JobID.

$ sbatch myjob.sl
Submitted batch job 360035

In case of syntax errors or invalid sbatch options a descriptive error message will be issued.

Stopping/Canceling a Job


Stop or cancel a Job by supplying one or more jobids.
scancel 360035 

Other examples:

Cancel all Jobs of a user
scancel -u $USER
Cancel all Jobs by name
scancel -n jobname

Updating properties of pending Jobs

The scontrol utility can be used to modify submitted Jobs which have not been started yet. Already running Jobs cannot be modified anymore.

Update a property of a job
# Modify the Timelimit
scontrol update jobid=$jobid TimeLimit=3-0

For a list of updatable fields see here.

sacct

sview

Resource Limits