Slurm Options
With SLURM there are three commands to reserve resource allocaction, resp. to submit jobs:
salloc
: to reserve allocations for interactive taskssrun
to run so-called job steps or small interactive jobssbatch
: to submit jobs to a queue for processing
An extensive documentation on the salloc
, srun
and sbatch
commands can be found in the SLURM documentation:
salloc
,
srun
,
sbatch
, or the man pages for each command, e.g $ man sbatch
.
The most commonly used parameters for these commands are listed below. Detailed information on important options can also be found in separate articles.
Parameter List
Option | Description |
---|---|
-A --account | The project account that is billed for your job. For example:-A m2_zdvhpc --account=hpckurs Mandatory. Looking for your account? |
-p --partition | The partition your job should run in. For example:-p parallel --partition=smp Mandatory. Look up available partitions. |
-n --ntasks | Controls the number of tasks to be created for the job (=cores, if no advanced topology is given). For example:-n 4 |
-N --nodes | The number of nodes you need. For example:--nodes=2 |
-t --time | Set the runtime limit of your job (within the partition constraints). For example to specify 1 hour:-t 01:00:00 More details on the format here. |
-J --job-name | Sets an arbitrary name for your job that is used for listing of jobs. Defaults to script name. For example:--job-name=%x.%j.out |
--task-per-node | Controls the maximum number of tasks per allocated node. |
-c --cpus-per-task | No. of CPUs per task |
-C --constraint | Which processor architecture to use. For example:-C broadwell --constraint=skylake Read more about this constraint here. |
--mem | The amount of memory per Node. Different units can be specified using [K|M|G|T] (default is M for MegaByte). See the Memory reservation page for details and hints, particularly with respect to partition default memory settings. |
--mem-per-cpu | Amount of memory per CPU. See above for the units. |
-o --output | Will direct stdout, stderr into one file. (SLURM writes buffered. Shell based solution do not write buffered.) |
-o <filename>.log -e <filename>.err | Will direct stdout to the log file and stderr to the error log file. |
-i <filename> | Instruct Slurm to connect the batch script’s standard input directly to the file name specified. |
You may use one or more replacement symbols, which are a percent sign “%” followed by a letter (e.g. %j). For example, job%4j.out
yields job0128.out
%A | Job array’s master job allocation number. |
%a | Job array ID (index) number. |
%J | jobid.stepid of the running job. (e.g. “128.0”) |
%j | jobid of the running job. |
%s | stepid of the running job. |
%u | User name. |
%x | Job name. |
Other important parameters / features on MOGON include:
- Using the ramdisk
- Using local scratch space
- Specifying runtimes in accordance with host models or host names
- Using GPU Queues
CPU Architecure
On MOGON II a third important parameter is present:
You may select the CPU type to be either skylake
or broadwell
for the Skylake and Broadwell nodes, respectively. If the architecture is not relevant for your application, select anyarch
.
This can be set with:
-C <selection list>
or--constraint=<selection list>
to sbatch
(on the command line or within a jobscript).
The defaults are:
broadwell
in the parallel partitionskylake
on the himster2 cluster (only applicable for HIM employees)
If nothing is specified you’ll get broadwell
except for the himster2 partition where it’s going to be skylake
. On the bigmem partition it will depend on your requested memory per node.
You can get a list of features and resources of each node with:
You will get an output similar to:
Specifying Runtime
Requesting runtime is straightforward: The -t
or --time
flag can be used in srun
/salloc
and sbatch
alike:
Or within a script
where <time reservation>
can be any of the acceptable time formats:
minutes
,minutes:seconds
,hours:minutes:seconds
,days-hours
,days-hours:minutes
anddays-hours:minutes:seconds
.
Time resolution is one minute and second values are rounded up to the next minute. A time limit of zero requests that no time limit is imposed, meaning that the maximum runtime of the partitions will be used.
Default Runtime
Most Nodes have a default runtime of 10 minutes after which they will be automatically killed unless more time is requested using the -t
flag. The default runtime for a partition can be checked with
The Max wall time is the maximum requestable runtime on a node. Large jobs need to be split up and continued in a separate job.
Receiving mail notifications
Specify which types of mails you want to receive with:
<TYPE>
can be any of:
NONE
,BEGIN
,END
,FAIL
,REQUEUE
,STAGE_OUT
(burst buffer stage out and teardown completed),INVALID_DEPEND
(dependency never satisfied) orALL
(equivalent toBEGIN
,END
,FAIL
,INVALID_DEPEND
,REQUEUE
, andSTAGE_OUT
)
Specify the receiving mail address using:
The default value is the submitting user. We highly recommend taking an internal address rather relying on an a third party service.
Signals
Slurm does not send signals if not requested. However, there are situations when you may like to trigger a signal (e.g. in some IO-workflows). You can request a specific signal with --signal
either to srun
or sbatch
from within a script. The flag can be used like --signal=<sig_num>[@<sig_time>]
: When a job is within sig_time
seconds of its end time, then the signal sig_num
is sent. If a sig_num
is specified without any sig_time
, the default time will $60 s$. Due to the resolution of event handling by Slurm, the signal may be sent up to $60 s$ earlier than specified.
An example would be:
Or within a script:
Here, the signal SIGUSR2
is sent to the application ten minutes before hitting the walltime of the job. Note once more that the slurm documentation states that there is a uncertainty of up to $1 min$.
Cancel Jobs
Use the
command with the jobid of the job you want to cancel.
In the case you want to cancel all your jobs, use -u
, --user=
:
You can also restrict the operation to jobs in a certain state with -t
, --state=
where <jobstate>
can be:
PENDING
RUNNING
SUSPENDED
Using sbatch
You have to prepare a job script to submit jobs using sbatch
. You can pass options to sbatch
directly on the command-line or specify them in the job script file.
To submit your job use:
When does my Job start
A job is either started when it has the highest priority and the required resources are available, or when it has the opportunity to backfill. The following command gives an estimate of the time and date when your Job is supposed to start, but note that the estimate is based on the workload at current time:
Slurm cannot anticipate that higher priority jobs will be submitted after yours, or that machine downtime will result in fewer resources for jobs, or that job crashes will result in large jobs being started earlier than expected, causing smaller jobs that are scheduled for replenishment to lose that replenishment opportunity.
Slurm-based Job Monitoring
For running Jobs you can retrieve information on memory usage with sstat
. Detailed information on which slots exactly your job is assigned to can be retrieved with the following command:
For completed Jobs, this Information is provided by sacct
, e.g.:
For completed jobs, you can also use seff
, which reports on the efficiency of a job’s CPU and memory utilisation.