Memory Limits
On this page
When submitting a job to Slurm, it’s essential to set an appropriate memory limit to ensure that your job has enough resources to run efficiently. By default, Slurm sets a relatively small memory limit, which depends on the partition and can be found in a table in the next section.
Binary prefixes are often indicated as
,
,
, … (kibi, mebi, gibi) to distinguish them from their decimal counterparts (kilo, mega, giga). That is not the case for Slurm, though. Slurm uses the decimal prefixes, but always refers to units based on powers of 2 (so 1 kB, corresponds to 1024 bytes).
To be consistent with Slurm’s documentation, we also stick to the standard SI prefixes despite the ambiguity.
Default Memory Size
MOGON NHR
| Partition | Memory [MB] | |
|---|---|---|
smallcpu |
$1930$ | per CPU |
parallel |
$248000$ | per Node |
longtime |
$1930$ | per CPU |
largemem |
$7930$ | per CPU |
hugemem |
$15560$ | per CPU |
a40 |
$7930$ | per CPU |
a100dl |
$7930$ | per CPU |
a100ai |
$15560$ | per CPU |
topml |
$2000$ | per CPU |
komet |
$1930$ | per CPU |
czlab |
$7930$ | per CPU |
To request a larger memory limit for your job, you can add the --mem option to your job submission script:
#SBATCH --mem=<size>[units]to specify the real memory required per node. Default units are megabytes. Different units can be specified using the suffix [K|M|G|T]. A memory size specification of zero (--mem=0) is treated as a special case and grants the job access to all of the memory on each node.
Did you know?
Jobs which exceed their per-node memory limit are killed automatically by the batch system.
Other Memory Options
| Command | Comment |
|---|---|
--mem-per-cpu=<size>[units] |
Minimum memory required per usable allocated CPU |
--mem-per-gpu=<size>[units] |
Minimum memory required per allocated GPU |
Available RAM at runtime
The technical specification for RAM on our nodes is slightly different from the memory that is effectively available. A small part is always going to be reserved for the operating system, the parallel file system, the scheduler, etc. Therefore, you find memory limits that might be relevant for a job – for example when specifying the --mem option – in the table below.
You can use the Slurm command sinfo to query all these limits. For example:
sinfo -e -o "%20P %16F %8z %.8m %.11l %18f" -S "+P+m" -M allThe output returns a list of our partitions and
- information on their nodes (
allocated/idle/other/total) - CPU specs of these nodes (
sockets:cores:threads) - size of real memory in megabytes
- walltime limits for job requests
- and feature constraints.
MOGON NHR
At the moment of writing, for example, the output on MOGON NHR looks like this:
PARTITION NODES(A/I/O/T) S:C:T MEMORY TIMELIMIT
a100ai 1/2/1/4 2:64:2 1992000 6-00:00:00
a100dl 1/8/2/11 2:64:1 1016000 6-00:00:00
a40 1/6/0/7 2:64:1 1016000 6-00:00:00
czlab 0/1/0/1 2:64:1 1031828 6-00:00:00
hugemem 0/1/3/4 2:64:1 1992000 6-00:00:00
komet 355/43/34/432 2:64:1 248000 6-00:00:00
largemem 0/19/9/28 2:64:1 1016000 6-00:00:00
longtime 9/0/1/10 2:64:1 248000 12-00:00:00
longtime 10/0/0/10 2:64:1 504000 12-00:00:00
mi250 0/2/0/2 2:64:1 1016000 6-00:00:00
mod 167/4/5/176 2:64:1 504000 6-00:00:00
parallel 355/43/34/432 2:64:1 248000 6-00:00:00
parallel 167/4/5/176 2:64:1 504000 6-00:00:00
quick 355/43/34/432 2:64:1 248000 8:00:00
smallcpu 355/43/34/432 2:64:1 248000 6-00:00:00
topml 0/1/0/1 2:48:2 1547259 6-00:00:00| Memory [MB] | Number of Nodes |
|---|---|
| $\space\space248.000$ | 432 |
| $\space\space504.000$ | 176 |
| $1.016.000$ | 28 |
| $1.992.000$ | 4 |
| Memory [MB] | Number of Nodes |
|---|---|
| $1.016.000$ | 20 |
| $1.992.000$ | 4 |