Local Scratch
On every node, there is local scratch space available to your running jobs.
Every job can therefore use a directory called /localscratch/${SLURM_JOB_ID}/
on the local disk. If a job array starts then this directory is also called /localscratch/${SLURM_JOB_ID}/
, where the variable SLURM_ARRAY_TASK_ID
is an index of a subjob in the job array and unrelated to $SLURM_JOB_ID
.
When to use Local Scratch If your job(s) in question are merely reading and writing big files in a linear mode, there is no requirement to use a local scratch or a ramdisk. However, these are scenarios, where using the local scratch might be beneficial:
- if your job produces many temporary files
- if your job reads a file or set of files in a directory repeatedly during run time (for multiple threads or concurrent jobs mean a random access pattern to the global file system, which is a true performance killer)
If your job runs on multiple nodes, you cannot use the local scratch space on one node from the other nodes.
If you need your input data on every node, please refer to the section Copy files to multiple nodes via job script.
For the further explanation on this page, we assume you have a program called my_program
, which reads input data from ./input_file
, writes output data to ./output_file
and periodically writes a checkpoint file called ./restart_file
.
The program shall be executed on a whole node with 64 processors. It probably uses OpenMP.
Assume you would normally start the program in the current working directory where it will read and write its data like this:
sbatch -N1 -p parallel ./my_program
Now to get the performance of local disk access, you want to use the aforementioned local scratch space on the compute node.
Available Space
Please take in mind, that the free space on /localscratch/${SLURM_JOB_ID}/
when the jobs starts, might be shared with other users. If you need the total space to be available to you for the whole job, you should request the whole node, for example by allocating all CPUs.
Copy files via job script and signalling batch scripts with SLURM
The following example will submit a jobscript, where SLURM will send a signal to the job script prior to ending. This will enable the jobscript to collect data written to the local scratch directory or directories.
#!/bin/bash
#SBATCH -A <mogon-project>
#SBATCH -p parallel
#SBATCH -t <appropriate time>
#SBATCH --signal=B:SIGUSR2@600 # e.g. signal 10 minutes before the job will end
# time, here, is defined in seconds.
# Store working directory to be safe
SAVEDPWD=$(pwd)
# We define a bash function to do the cleaning when the signal is caught
cleanup(){
# Note: The following only works on single with output on the node,
# where the jobscript is running.
# For multinode output, you can use the 'sgather' command or
# get in touch with us, if the case is more complex.
cp /localscratch/${SLURM_JOB_ID}/output_file ${SAVEDPWD}/ &
cp /localscratch/${SLURM_JOB_ID}/restart_file ${SAVEDPWD}/ &
wait
exit 0
}
# Register the cleanup function when SIGUSR2 is sent,
# ten minutes before the job gets killed
trap 'cleanup' SIGUSR2
# Copy input file
cp ${SAVEDPWD}/input_file /localscratch/${SLURM_JOB_ID}
cp ${SAVEDPWD}/restart_file /localscratch/${SLURM_JOB_ID}
# Go to jobdir and start the program
cd /localscratch/${SLURM_JOB_ID}
${SAVEDPWD}/my_program
# Call the cleanup function when everything went fine
cleanup
Signalling in SLURM – difference between signalling submission scripts and applications
In SLURM applications do not automatically get a signal, before hitting the walltime. It needs to be specified:
sbatch --signal=SIGUSR2@600 ...
This would send the signal SIGUSR2
to the application ten minutes before hitting the walltime of the job. Note that the slurm documentation states that there is an uncertainty of up to 1 minute.
Usually this requires you to use
sbatch --signal=B:SIGUSR2@600 ...
within a submission script to signal the batch-job (instead of all the children of but not the batch job itselft). The reason is: If using a submission script like the one above, you trap the signal within the script, not the application.
Copy files to multiple nodes via job script
The following script can be used to ensure that input files are present in the job directory on all nodes.
The demonstrated sbcast
command can also be used for the one-node example above.
#!/bin/bash
#SBATCH -N 2
# use other parameterization as appropriate
JOBDIR="/localscratch/${SLURM_JOB_ID}"
# copy the input file on all nodes
sbcast <somefile> $JOBDIR/<somefile>
# NOTE: Unlike 'cp' which accepts a directory and would assume that
# the destination file carries the same name, 'sbcast'
# requires that a filename is given for the destination.