Slurm Options

With Slurm there are three commands to reserve resource allocaction, resp. to submit jobs:

  • salloc: to reserve allocations for interactive tasks
  • srun to run so-called job steps or small interactive jobs
  • sbatch: to submit jobs to a queue for processing

An extensive documentation on the salloc, srun and sbatch commands can be found in the Slurm documentation: salloc, srun, sbatch, or the man pages for each command, e.g $ man sbatch.

The most commonly used parameters for these commands are listed below. Detailed information on important options can also be found in separate articles.

Parameter List

Option Description
-A --account The project account that is billed for your job. For example:
-A m2_zdvhpc
--account=hpckurs
Mandatory. Looking for your account?
-p --partition The partition your job should run in. For example:
-p parallel
--partition=smp
Mandatory. Look up available partitions.
-n --ntasks Controls the number of tasks to be created for the job (=cores, if no advanced topology is given). For example:
-n 4
-N --nodes The number of nodes you need. For example:
--nodes=2
-t --time Set the runtime limit of your job (within the partition constraints). For example to specify 1 hour:
-t 01:00:00
More details on the format here.
-J --job-name Sets an arbitrary name for your job that is used for listing of jobs. Defaults to script name. For example:
--job-name=%x.%j.out
--task-per-node Controls the maximum number of tasks per allocated node.
-c --cpus-per-task No. of CPUs per task
--mem The amount of memory per Node. Different units can be specified using [K|M|G|T] (default is M for MegaByte). See the Memory reservation page for details and hints, particularly with respect to partition default memory settings.
--mem-per-cpu Amount of memory per CPU. See above for the units.
-o --output Will direct stdout, stderr into one file. (Slurm writes buffered. Shell based solution do not write buffered.)
-o <filename>.log
-e <filename>.err
Will direct stdout to the log file and stderr to the error log file.
-i <filename> Instruct Slurm to connect the batch script’s standard input directly to the file name specified.

You may use one or more replacement symbols, which are a percent sign “%” followed by a letter (e.g. %j). For example, job%4j.out yields job0128.out

%A Job array’s master job allocation number.
%a Job array ID (index) number.
%J jobid.stepid of the running job. (e.g. “128.0”)
%j jobid of the running job.
%s stepid of the running job.
%u User name.
%x Job name.

Other important parameters / features on MOGON include:

Once a job has been submitted you can get information on it or control with this list of commands.

Specifying Runtime

Requesting runtime is straightforward: The -t or --time flag can be used in srun/salloc and sbatch alike:

srun --time <time reservation>

Or within a script

#SBATCH -t <time reservation>

where <time reservation> can be any of the acceptable time formats:

  • minutes,
  • minutes:seconds,
  • hours:minutes:seconds,
  • days-hours,
  • days-hours:minutes and
  • days-hours:minutes:seconds.

Time resolution is one minute and second values are rounded up to the next minute. A time limit of zero requests that no time limit is imposed, meaning that the maximum runtime of the partitions will be used.

Default Runtime

Most of our nodes have a default runtime of 60 minutes after which they will be automatically killed unless more time is requested using the -t flag. The default runtime for a partition can be checked with

scontrol show partition <partition>

The Max wall time is the maximum requestable runtime on a node. Large jobs need to be split up and continued in a separate job.

Receiving mail notifications

Specify which types of mails you want to receive with:

--mail-type=<TYPE>

<TYPE> can be any of:

  • NONE,
  • BEGIN,
  • END,
  • FAIL,
  • REQUEUE,
  • STAGE_OUT (burst buffer stage out and teardown completed),
  • INVALID_DEPEND (dependency never satisfied) or
  • ALL (equivalent to BEGIN, END, FAIL, INVALID_DEPEND, REQUEUE, and STAGE_OUT)

Specify the receiving mail address using:

--mail-user=<username>@uni-mainz.de

The default value is the submitting user. We highly recommend taking an internal address rather relying on an a third party service.

Signals

Slurm does not send signals if not requested. However, there are situations when you may like to trigger a signal (e.g. in some IO-workflows). You can request a specific signal with --signal either to srun or sbatch from within a script. The flag can be used like --signal=<sig_num>[@<sig_time>]: When a job is within sig_time seconds of its end time, then the signal sig_num is sent. If a sig_num is specified without any sig_time, the default time will $60 s$. Due to the resolution of event handling by Slurm, the signal may be sent up to $60 s$ earlier than specified.

An example would be:

sbatch --signal=SIGUSR2@600 ...

Or within a script:

#SBATCH --signal=SIGUSR2@600

Here, the signal SIGUSR2 is sent to the application ten minutes before hitting the walltime of the job. Note once more that the slurm documentation states that there is a uncertainty of up to $1 min$.

Cancel Jobs

Use the

scancel <jobid>

command with the jobid of the job you want to cancel.

In the case you want to cancel all your jobs, use -u, --user=:

scancel -u <username>

You can also restrict the operation to jobs in a certain state with -t, --state=

scancel -t <jobstate>

where <jobstate> can be:

  • PENDING
  • RUNNING
  • SUSPENDED

Using sbatch

You have to prepare a job script to submit jobs using sbatch. You can pass options to sbatch directly on the command-line or specify them in the job script file.

To submit your job use:

sbatch myjobscript

When does my Job start

A job is either started when it has the highest priority and the required resources are available, or when it has the opportunity to backfill. The following command gives an estimate of the time and date when your Job is supposed to start, but note that the estimate is based on the workload at current time:

squeue --start

Slurm cannot anticipate that higher priority jobs will be submitted after yours, or that machine downtime will result in fewer resources for jobs, or that job crashes will result in large jobs being started earlier than expected, causing smaller jobs that are scheduled for replenishment to lose that replenishment opportunity.

Slurm-based Job Monitoring

For running Jobs you can retrieve information on memory usage with sstat. Detailed information on which slots exactly your job is assigned to can be retrieved with the following command:

scontrol show -d job <jobid>

For completed Jobs, this Information is provided by sacct, e.g.:

sacct --format JobID,Jobname,NTasks,Nodelist,MaxRSS,MaxVMSize,AveRSS,AveVMSize

For completed jobs, you can also use seff, which reports on the efficiency of a job’s CPU and memory utilisation.

seff <jobid>