May 2, 2025 in Guide, Example by Jens Rutten2 minutes
This small example provides a concise walkthrough of how to use PyTorch with GPU acceleration in an interactive job on MOGON NHR/KI
For MOGON NHR and MOGON KI users with GPU access, this example demonstrates how to use PyTorch with GPU acceleration in interactive jobs via an available container.
Run the following command to start an interactive job on MOGON NHR with the requested resources:
salloc -t 10 -p a40 --gres=gpu:2
Here, you are requesting:
-t 10
)-p a40
)--gres=gpu:2
)module purge
module use /apps/easybuild/ood/modules/all/
module purge
: Removes all loaded modules to avoid conflicts.module use /apps/easybuild/ood/modules/all/
: Adds the path for the required modules to the module search path.module load tools/JupyterLab/4.2.5_gpu_dev
This module provides the environment variables and the container for JupyterLab, which includes PyTorch.
Run the following command to verify the PyTorch installation and GPU selection:
srun apptainer exec --nv $JUPYTERLAB_IMAGE python3 -c "import torch; print(f'PyTorch is installed: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_
available()}'); print([(i, torch.cuda.get_device_properties(i)) for i in range(torch.cuda.device_count())])"
srun apptainer exec --nv
: Runs the command in the specified container with NVIDIA GPU support.$JUPYTERLAB_IMAGE
: The environment variable containing the path to the JupyterLab container (see below).python3 -c "..."
: Runs the Python code directly to display PyTorch and GPU properties.Module tools/JupyterLab/4.2.5_gpu_dev
provides $JUPYTERLAB_IMAGE
, which is
JUPYTERLAB_IMAGE=/apps/easybuild/ood/software/jupyterlab/4.2.5/mod_JupyterLab-4.2.5_gpu.sif
this container is based on the official Jupyter PyTorch Notebook image and is used in MODs JupyterApp.
You should get an output similar to the following:
PyTorch is installed: 2.5.1+cu121
CUDA available: True
[
(0, _CudaDeviceProperties(name='NVIDIA A40', major=8, minor=6, total_memory=45499MB, multi_processor_count=84, uuid=9ae31388-f709-b016-6fd3-3c9d613749d6, L2_cache_size=6MB)),
(1, _CudaDeviceProperties(name='NVIDIA A40', major=8, minor=6, total_memory=45499MB, multi_processor_count=84, uuid=923ad8b2-b993-af05-24aa-21885c053243, L2_cache_size=6MB))
]
This container can be further customized or updated to meet specific needs. Please share suggestions or issues to help us enhance this resource.