Partitioning

Individual compute nodes are grouped together into larger subsets of the cluster to form so called partitions.

The tables on this page list key attributes of MOGON’s partitions. You can also query this information using the following Slurm command:

scontrol show partition <name-of-partition>

For example:

scontrol show partition parallel

to show default, minimal, and maximal settings for reservation time, memory capacity, etc. of the parallel partition.

Maximum runtime
Limit is the maximum runtime you can request on a node. Larger jobs need to be split up and continued in a separate job.

MOGON NHR

PartitionNodesLimitRAMIntended Use
smallcpuCPU-Nodes6 days$\space256\thinspace\text{GB}$
$\space512\thinspace\text{GB}$
for jobs using CPUs $\ll 128$
max. run. jobs per user: $3\text{k}$
parallelCPU-Nodes6 days$\space256\thinspace\text{GB}$
$\space512\thinspace\text{GB}$
jobs using $\text{n}$ nodes,
CPUs $\text{n}\times128$ for $\text{n}\in[1,2,\ldots]$
longtimeCPU-Nodes12 days$\space256\thinspace\text{GB}$
$\space512\thinspace\text{GB}$
long running jobs $\ge \text{6 days}$
largememCPU-Nodes6 days$\space1024\thinspace\text{GB}$memory requirement
hugememCPU-Nodes6 days$\space2048\thinspace\text{GB}$memory requirement

Partitions supporting Accelerators

PartitionNodesLimitRAMIntended Use
mi250AMD-Nodes6 days$\space1024\thinspace\text{GB}$GPU requirement
smallgpuA406 days$\space1024\thinspace\text{GB}$GPU requirement
a100dlA1006 days$\space1024\thinspace\text{GB}$GPU requirement
a100aiA1006 days$\space2048\thinspace\text{GB}$GPU requirement

Private Partitions within MOGON NHR

PartitionNodesLimitRAMAccelerators
topmlgpu06016 days$1\thinspace\text{TB}$NVIDIA H100 80GB HBM3
kometfloating Partition6 days$256\thinspace\text{GB}$-
czlabgpu06026 days$1.5\thinspace\text{TB}$NVIDIA L40

MOGON II

  • Only ~5% of nodes are available for small jobs (n<<40).
  • Each account has a GrpTRESRunLimit.

Check using sacctmgr -s list account <your_account> format=account,GRpTRESRunMin, you can use sacctmgr -n -s list user $USER formatAccount%20 | grep -v none to get your accounts. The default is cpu=22982400, which is the equivalent of using 700 nodes for 12 hours in total:

PartitionNodesLimitRAMInterconnectIntended Use
smpz-nodes x-nodes5 days$\space64\thinspace\text{GB}$ $\space96\thinspace\text{GB}$
$128\thinspace\text{GB}$ $192\thinspace\text{GB}$
$256\thinspace\text{GB}$
Intel Omnipathfor jobs using CPUs
$\text{n} \ll 40$ or $\text{n} \ll 64$
max. running jobs per
user: $3\text{k}$
develz-nodes x-nodes4 hours$\space64\thinspace\text{GB}$
$\space96\thinspace\text{GB}$
$128\thinspace\text{GB}$
Intel Omnipathmax. 2 Jobs per User,
max. 320 CPUs in total
parallelz-nodes x-nodes5 days$\space64\thinspace\text{GB}$ $\space96\thinspace\text{GB}$
$128\thinspace\text{GB}$
$192\thinspace\text{GB}$
$256\thinspace\text{GB}$
Intel Omnipathjobs using $\text{n}$ nodes, CPUs $\text{n}\times40$ or $\text{n}\times64$ for $\text{n}\in[1,2,3,\ldots]$
bigmemz-nodes x-nodes5 days$384\thinspace\text{GB}$ $512\thinspace\text{GB}$
$1\thinspace\text{TB}$
$1.5\thinspace\text{TB}$
Intel Omnipathfor jobs needing more than $256\thinspace\text{GB}$ of memory
longtimez-nodes x-nodes12 days$\space64\thinspace\text{GB}$ $\space96\thinspace\text{GB}$
$128\thinspace\text{GB}$
$192\thinspace\text{GB}$
$256\thinspace\text{GB}$
$384\thinspace\text{GB}$ $512\thinspace\text{GB}$
$1\thinspace\text{TB}$
$1.5\thinspace\text{TB}$
Intel Omnipathfor jobs needing more than 5 days walltime

Partitions supporting Accelerators

PartitionNodesLimitInterconnectAcceleratorsComment
deeplearningdgx-nodes18 hoursInfiniband8 Tesla V100-SXM2 per nodefor access
m2_gpus-nodes5 daysInfiniband6 GeForce GTX 1080 Ti per node-

Private Partitions within MOGON II

PartitionNodesLimitRAMInterconnectAccelerators
himster2_expx0753 - x0794,
x2001 - x2023
5 days$96\thinspace\text{GB}$Intel OmniPath-
himster2_thx2024 - x23205 days$96\thinspace\text{GB}$Intel OmniPath-

Hidden Partitions

Information on hidden partitions can be viewed by anyone. These partitions are set to be hidden to avoid cluttering the output for every poll - these partitions are “private” to certain projects / groups and of interest to these groups, only.

To visualize all jobs for a user in all partitions supply the -a flag:

squeue -u $USER -a

Likewise sinfo can be supplemented with -a to gather informations. All other commands work without this flag as expected.