Partitioning
Individual compute nodes are grouped together into larger subsets of the cluster to form so called partitions.
The tables on this page list key attributes of MOGON’s partitions. You can also query this information using the following Slurm command:
For example:
to show default, minimal, and maximal settings for reservation time, memory capacity, etc. of the parallel
partition.
MOGON NHR
Partition | Nodes | Limit | RAM | Intended Use |
---|---|---|---|---|
smallcpu | CPU-Nodes | 6 days | $\space256\thinspace\text{GB}$ $\space512\thinspace\text{GB}$ | for jobs using CPUs $\ll 128$ max. run. jobs per user: $3\text{k}$ |
parallel | CPU-Nodes | 6 days | $\space256\thinspace\text{GB}$ $\space512\thinspace\text{GB}$ | jobs using $\text{n}$ nodes, CPUs $\text{n}\times128$ for $\text{n}\in[1,2,\ldots]$ |
longtime | CPU-Nodes | 12 days | $\space256\thinspace\text{GB}$ $\space512\thinspace\text{GB}$ | long running jobs $\ge \text{6 days}$ |
largemem | CPU-Nodes | 6 days | $\space1024\thinspace\text{GB}$ | memory requirement |
hugemem | CPU-Nodes | 6 days | $\space2048\thinspace\text{GB}$ | memory requirement |
Partitions supporting Accelerators
Partition | Nodes | Limit | RAM | Intended Use |
---|---|---|---|---|
mi250 | AMD-Nodes | 6 days | $\space1024\thinspace\text{GB}$ | GPU requirement |
smallgpu | A40 | 6 days | $\space1024\thinspace\text{GB}$ | GPU requirement |
a100dl | A100 | 6 days | $\space1024\thinspace\text{GB}$ | GPU requirement |
a100ai | A100 | 6 days | $\space2048\thinspace\text{GB}$ | GPU requirement |
Private Partitions within MOGON NHR
Partition | Nodes | Limit | RAM | Accelerators | |
---|---|---|---|---|---|
topml | gpu0601 | 6 days | $1\thinspace\text{TB}$ | NVIDIA H100 80GB HBM3 | |
komet | floating Partition | 6 days | $256\thinspace\text{GB}$ | - | |
czlab | gpu0602 | 6 days | $1.5\thinspace\text{TB}$ | NVIDIA L40 |
MOGON II
- Only ~5% of nodes are available for small jobs (
n<<40
). - Each account has a
GrpTRESRunLimit
.
Check using sacctmgr -s list account <your_account> format=account,GRpTRESRunMin
, you can use sacctmgr -n -s list user $USER formatAccount%20 | grep -v none
to get your accounts. The default is cpu=22982400
, which is the equivalent of using 700 nodes for 12 hours in total:
Partition | Nodes | Limit | RAM | Interconnect | Intended Use |
---|---|---|---|---|---|
smp | z-nodes x-nodes | 5 days | $\space64\thinspace\text{GB}$ $\space96\thinspace\text{GB}$ $128\thinspace\text{GB}$ $192\thinspace\text{GB}$ $256\thinspace\text{GB}$ | Intel Omnipath | for jobs using CPUs $\text{n} \ll 40$ or $\text{n} \ll 64$ max. running jobs per user: $3\text{k}$ |
devel | z-nodes x-nodes | 4 hours | $\space64\thinspace\text{GB}$ $\space96\thinspace\text{GB}$ $128\thinspace\text{GB}$ | Intel Omnipath | max. 2 Jobs per User, max. 320 CPUs in total |
parallel | z-nodes x-nodes | 5 days | $\space64\thinspace\text{GB}$ $\space96\thinspace\text{GB}$ $128\thinspace\text{GB}$ $192\thinspace\text{GB}$ $256\thinspace\text{GB}$ | Intel Omnipath | jobs using $\text{n}$ nodes, CPUs $\text{n}\times40$ or $\text{n}\times64$ for $\text{n}\in[1,2,3,\ldots]$ |
bigmem | z-nodes x-nodes | 5 days | $384\thinspace\text{GB}$ $512\thinspace\text{GB}$ $1\thinspace\text{TB}$ $1.5\thinspace\text{TB}$ | Intel Omnipath | for jobs needing more than $256\thinspace\text{GB}$ of memory |
longtime | z-nodes x-nodes | 12 days | $\space64\thinspace\text{GB}$ $\space96\thinspace\text{GB}$ $128\thinspace\text{GB}$ $192\thinspace\text{GB}$ $256\thinspace\text{GB}$ $384\thinspace\text{GB}$ $512\thinspace\text{GB}$ $1\thinspace\text{TB}$ $1.5\thinspace\text{TB}$ | Intel Omnipath | for jobs needing more than 5 days walltime |
Partitions supporting Accelerators
Partition | Nodes | Limit | Interconnect | Accelerators | Comment |
---|---|---|---|---|---|
deeplearning | dgx-nodes | 18 hours | Infiniband | 8 Tesla V100-SXM2 per node | for access |
m2_gpu | s-nodes | 5 days | Infiniband | 6 GeForce GTX 1080 Ti per node | - |
Private Partitions within MOGON II
Partition | Nodes | Limit | RAM | Interconnect | Accelerators |
---|---|---|---|---|---|
himster2_exp | x0753 - x0794, x2001 - x2023 | 5 days | $96\thinspace\text{GB}$ | Intel OmniPath | - |
himster2_th | x2024 - x2320 | 5 days | $96\thinspace\text{GB}$ | Intel OmniPath | - |
Hidden Partitions
Information on hidden partitions can be viewed by anyone. These partitions are set to be hidden to avoid cluttering the output for every poll - these partitions are “private” to certain projects / groups and of interest to these groups, only.
To visualize all jobs for a user in all partitions supply the -a
flag:
Likewise sinfo
can be supplemented with -a
to gather informations. All other commands work without this flag as expected.