Using GPUs

Using graphics processing units for computations on MOGON

GPU Partitions

On MOGON, GPU tasks can only be run on the following partitions that have GPU resources available:

Partition Hosts GPUs RAM
mi250 gpu[0101-0102] AMD MI250 $1024\thinspace\text{GB}$
a40 gpu[0301-0307] Nvidia Tesla A40 48G $1024\thinspace\text{GB}$
a100ai gpu[0201-0204] Nvidia Tesla A100 80G $2048\thinspace\text{GB}$
Partition Hosts GPUs RAM
a100dl gpu[0001-0011] Nvidia Tesla A100 40G $1024\thinspace\text{GB}$

GPU Node Misuse

Calculating on GPU nodes without using the accelerators / GPUs is prohibited! We reserve the right to terminate an account for abuse of these resources.

Access

To get to know which account to use for the m2_gpu partition, login and call:

sacctmgr list user $USER -s where Partition=m2_gpu formatUser%10,Account%20,Partition%10

All accounts that show Partition=m2_gpu can be used to submit jobs to the GPU partition. To find information about other partitions, replace m2_gpu with the partition you are interested in.

Limitations

All GPU partitions on MOGON NHR and MOGON KI have a time limit of 6 days for all jobs. In order to prevent single users or groups to flood the entire partition with their long running jobs, a limitation has been set, such that other users get the chance to run their jobs, too.

This may result in jobs not starting due to so-called pending reasons such as QOSGrpGRESRunMinutes. For other pending reasons, see our page on job management.

Submitting to the GPU Partitions

To use a GPU you have to explicitly reserve it as a resource in the submission script:

#!/bin/bash
# ... other SBATCH statements
#SBATCH --gres=gpu:<number>
#SBATCH -p <appropriate partition>

<number> can be anything from 1-6 on our GPU nodes, depending on the partition. In order to use more than 1 GPU the application needs to support using this much, of course.

--gres-flags=enforce-binding is currently not working properly in our Slurm version. You may try to use it with Multi-task GPU job but it won’t work with Jobs reserving only part of a node. SchedMD seems to work on a bug fix.

Simple Single GPU Job

Take a single GPU node and run an executable on it.

#!/bin/bash
#-----------------------------------------------------------------
# Example Slurm job script to run serial applications on MOGON.
#
# This script requests one task using 2 cores on one GPU-node.  
#-----------------------------------------------------------------

#SBATCH -J mysimplegpujob        # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p a40                   # Partition name
#SBATCH -n 1                     # Total number of tasks
#SBATCH -c 2                     # CPUs per task
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:1             # Reserve 1 GPUs
#SBATCH -A <mogon-project>       # Specify allocation to charge against

# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA

# Launch the executable
srun <myexecutable>

Simple Full Node GPU Job

Take a complete GPU node and run an executable that uses all 4 GPUs.Be sure that you application can utilize more than 1 GPU, if you request it!

#!/bin/bash
#-----------------------------------------------------------------
# Example Slurm job script to run serial applications on MOGON.
#
# This script requests one task using all cores (48) on one node.
# The job will have access to all the memory and all 6 GPUs in the node.  
#-----------------------------------------------------------------

#SBATCH -J mysimplegpujob        # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p a40                   # Partition name
#SBATCH -N 1                     # Total number of nodes requested (128 cores per GPU node)
#SBATCH -n 1                     # Total number of tasks
#SBATCH -c 64                    # CPUs per task
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:4             # Reserve 4 GPUs

#SBATCH -A <mogon-project>       # Specify allocation to charge against

# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA

# Launch the executable
srun <myexecutable>

Multi-task GPU-Job

Take a full GPU-node and run 8 executables each on one GPU.

#!/bin/bash
#-----------------------------------------------------------------
# Example Slurm job script to run serial applications on MOGON.
#
# This script requests one task using all cores (48) on one node.
# The job will have access to all the memory and all 6 GPUs in the node.  
#-----------------------------------------------------------------

#SBATCH -J mysimplegpujob        # Job name
#SBATCH -o mysimplegpujob.%j.out # Specify stdout output file (%j expands to jobId)
#SBATCH -p a40                   # Partition name
#SBATCH -N 1                     # Total number of nodes requested (128 cores per GPU node)
#SBATCH -n 8                     # Total number of tasks
#SBATCH -c 16                     # CPUs per task
#SBATCH -t 00:30:00              # Run time (hh:mm:ss) - 0.5 hours
#SBATCH --gres=gpu:4             # Reserve 6 GPUs

#SBATCH -A <mogon-project>       # Specify allocation to charge against

# Load all necessary modules if needed (these are examples)
# Loading modules in the script ensures a consistent environment.
module load system/CUDA

# Launch the tasks
GPUTASKS=$(grep -o ',' <(echo $SLURM_JOB_GPUS) | wc -l)
for ((i=0; i<GPUTASKS; i++))
do
   echo "TASK $i"
   srun -n 1 -c $SLURM_CPUS_PER_TASK --exclusive --gres=gpu:1 --mem18G <executable> &
done

wait

Ignorant Applications

Most GPU programs just know which device to select. Some do not. In any case Slurm exports the environment variable CUDA_VISIBLE_DEVICES, which simply holds the comma-separated, enumerated devices allowed in a job environment, starting from 0.

So, when for instance another job occupies the first device and your job selects two GPUs, CUDA_VISIBLE_DEVICES might hold the value 1,2 and you can read this into an array (with a so-called HERE string ):

#good practice is to store the initial IFS setting:
IFSbck=$IFS
IFS=',' read -a devices <<< $CUDA_VISIBLE_DEVICES
IFS=$IFSbck # in case it is used in subsequent code

Now, you can point your applications to the respective devices (assuming you start two and not one, which uses both):

cmd --argument_which_receives_the_device ${devices[0]} & # will hold the 1st
cmd --argument_which_receives_the_device ${devices[1]} & # will hold the 2nd