Using Slurm

Submitting jobs to MOGON with Slurm

The Simple Linux Utility for Resource Management, or short SLURM, has evolved from a simple tool for resource management to a highly capable scheduler. It is deployed on many high performance computing systems worldwide nowadays and takes a central place in your daily work on MOGON by fulfilling several functions:

  • Slurm allocates resources within the cluster depending on your requirements. It is your gateway to access compute nodes from the login nodes—no matter if interactively or via batch jobs.
  • It provides a framework for launching, monitoring and otherwise managing jobs.
  • When there are more tasks than required resources, Slurm schedules pending jobs, balances workloads, and manages the queue for jobs waiting to run on MOGON.

Submitting a Job

In the context of Slurm a job is a work package usually defined in a bash script. It contains

  • the resource requirements and meta data,
  • setup of the working environment, and
  • the list of tasks to be executed as job steps.

Here is an example:

#!/bin/bash

#========[ + + + + Requirements + + + + ]========#
#SBATCH --partition=smp
#SBATCH --account=<mogon-project>

#========[ + + + + Job Steps + + + + ]========#
srun echo "Hello, world!"

As demonstrated in the script above, resource requirements are passed to Slurm line by line and indicated with the #SBATCH keyword. This script submits a job to MOGON’s smp partition, which is intended for jobs that require only a small number of CPUs, less than those an entire node provides. The account that will be billed for consumed resources is <mogon-project>.

Slurm will reject jobs that do not at least set --partition and --account.

Since this is just a toy example, there is no need for us to load any software modules and the setup of the working environment has been omitted.

The srun command initiates a job step.

The job will be sent to Slurm for processing using the command

sbatch <filename>

Although this is a working minimal example, many optional Slurm parameters were left to defaults. For better transparency, you should also specify at least the run time and the maximum number of concurrent tasks that Slurm is expected to run:

#!/bin/bash

#========[ + + + + Requirements + + + + ]========#
#SBATCH --partition=smp
#SBATCH --account=<mogon-project>

#SBATCH --time=0-00:10:00
#SBATCH --ntasks=1

#========[ + + + + Job Steps + + + + ]========#
srun echo "This message was sent from job $SLURM_JOB_ID on node $(hostname)"

References

Slurm Options

Other commonly used options can be found here

Slurm Documentation

For a complete listing of options refer to the official site