Using GPUs

Using graphics processing units for computations on MOGON

GPU Partitions

On MOGON, GPU tasks can only be run on clusters that have GPU ressources available. This cluster needs to be explicitly specified by choosing its corresponding partition. The following partitions have GPU ressources available:

There is a number of different public partitions part of the MOGON II cluster that support GPU usage:

PartitionHostsGPUsRAM
mi250gpu[0101-0102]AMD MI250$1024\thinspace\text{GB}$
a40gpu[0301-0307]Nvidia Tesla A40 48G$1024\thinspace\text{GB}$
a100dlgpu[0001-0011]Nvidia Tesla A100 40G$1024\thinspace\text{GB}$
a100aigpu[0201-0204]Nvidia Tesla A100 80G$2048\thinspace\text{GB}$
PartitionHostsGPUsRAM
deeplearningdgx[01-02]Nvidia V100 16G/32G$512\thinspace\text{GB}$
m2_gpus[0001-0030]6x Nvidia GeForce GTX 1080 Ti$128\thinspace\text{GB}$
GPU node misuse
Calculating on GPU nodes without using the accelerators / GPUs is prohibited! We reserve the right to terminate an account for abuse of these resources.

Access

To get to know which account to use for the m2_gpu partition, login and call:

sacctmgr list user $USER -s where Partition=m2_gpu formatUser%10,Account%20,Partition%10

All accounts that show Partition=m2_gpu can be used to submit jobs to the GPU partition. To find information about other partitions, replace m2_gpu with the partition you are interested in.

Every group interested in using those GPUs, which does not have access already, can apply for it via the AHRP website (currently only MOGON II).

Limitations

MOGON NHR

All GPU partitions on MOGON NHR have a time limit of 6 days for all jobs. In order to prevent single users or groups to flood the entire partition with their long running jobs, a limitation has been set, such that other users get the chance to run their jobs, too.

This may result in jobs not starting due to so-called pending reasons such as QOSGrpGRESRunMinutes. For other pending reasons, see our page on job management.

MOGON II

The m2_gpu is a single partition allowing a runtime of up to 5 days. In order to prevent single users or groups to flood the entire partition with their long running jobs, a limitation has been set, such that other users get the chance to run their jobs, too.

This may result in pending reasons such as QOSGrpGRESRunMinutes. For other pending reasons, see our page on job management.

Unlike the login-nodes the s-nodes have Intel-CPUs, which means that you have to compile your code on the GPU-nodes otherwise you may end up with illegal instruction errors or similar.
There is a partition m2_gpu-compile which allows for running one job per user with maximum 8 cores, 1 CPU, and --mem=18000M for compiling your code. Maximum runtime for compile jobs is 60 minutes.

Submitting to the GPU-Partitions

To use a GPU you have to explicitly reserve it as a resource in the submission script:

#!/bin/bash
# ... other SBATCH statements
#SBATCH --gres=gpu:<number>
#SBATCH -p <appropriate partition>

<number> can be anything from 1-6 on our GPU nodes, depending on the partition. In order to use more than 1 GPU the application needs to support using this much, of course.

--gres-flags=enforce-binding is currently not working properly in our Slurm version. You may try to use it with Multi-task GPU job but it won’t work with Jobs reserving only part of a node. SchedMD seems to work on a bug fix.

Simple single GPU-Job

Take a single GPU-node and run an executable on it1.

#!/bin/bash
#-----------------------------------------------------------------
# Example SLURM job script to run serial applications on MOGON.
#
# This script requests one task using 2 cores on one GPU-node.  
#-----------------------------------------------------------------

#SBATCH -J mysimplegpujob        # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p a40                   # Partition name
#SBATCH -n 1                     # Total number of tasks
#SBATCH -c 2                     # CPUs per task
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:1             # Reserve 1 GPUs
#SBATCH -A <mogon-project>       # Specify allocation to charge against

# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA

# Launch the executable
srun <myexecutable>

Simple full node GPU-Job

Take a full GPU-node and run an executable that uses all 4 GPUs2.

#!/bin/bash
#-----------------------------------------------------------------
# Example SLURM job script to run serial applications on MOGON.
#
# This script requests one task using all cores (48) on one node.
# The job will have access to all the memory and all 6 GPUs in the node.  
#-----------------------------------------------------------------

#SBATCH -J mysimplegpujob        # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p a40                   # Partition name
#SBATCH -N 1                     # Total number of nodes requested (128 cores per GPU node)
#SBATCH -n 1                     # Total number of tasks
#SBATCH -c 64                    # CPUs per task
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:4             # Reserve 4 GPUs

#SBATCH -A <mogon-project>       # Specify allocation to charge against

# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA

# Launch the executable
srun <myexecutable>

Multi-task GPU-Job

Take a full GPU-node and run 8 executables each on one GPU.

#!/bin/bash
#-----------------------------------------------------------------
# Example SLURM job script to run serial applications on MOGON.
#
# This script requests one task using all cores (48) on one node.
# The job will have access to all the memory and all 6 GPUs in the node.  
#-----------------------------------------------------------------

#SBATCH -J mysimplegpujob        # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p a40                   # Partition name
#SBATCH -N 1                     # Total number of nodes requested (128 cores per GPU node)
#SBATCH -n 8                     # Total number of tasks
#SBATCH -c 16                     # CPUs per task
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:4             # Reserve 6 GPUs

#SBATCH -A <mogon-project>       # Specify allocation to charge against

# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA

# Launch the tasks
GPUTASKS=$(grep -o ',' <(echo $SLURM_JOB_GPUS) | wc -l)
for ((i=0; i<GPUTASKS; i++))
do
   echo "TASK $i"
   srun -n 1 -c $SLURM_CPUS_PER_TASK --exclusive --gres=gpu:1 --mem18G <executable> &
done

wait

Ignorant Applications

Most GPU programs just know which device to select. Some do not. In any case SLURM exports the environment variable CUDA_VISIBLE_DEVICES, which simply holds the comma-separated, enumerated devices allowed in a job environment, starting from 0.

So, when for instance another job occupies the first device and your job selects two GPUs, CUDA_VISIBLE_DEVICES might hold the value 1,2 and you can read this into an array (with a so-called HERE string ):

#good practice is to store the initial IFS setting:
IFSbck=$IFS
IFS=',' read -a devices <<< $CUDA_VISIBLE_DEVICES
IFS=$IFSbck # in case it is used in subsequent code

Now, you can point your applications to the respective devices (assuming you start two and not one, which uses both):

cmd --argument_which_receives_the_device ${devices[0]} & # will hold the 1st
cmd --argument_which_receives_the_device ${devices[1]} & # will hold the 2nd

Footnotes


  1. Be sure that you set the amount of memory appropriately! ↩︎

  2. Be sure that you application can utilize more than 1 GPU, if you request it! ↩︎