R
Installing R Libraries
Missing R-libraries can be installed using a simple install.packages(c(<package_A>, <package_B>))
. This command will ask for a directory, and you can direct it to your home directory.
Please ask the HPC-Team to install bigger libraries or libraries of (putatively) wider interest for you.
Listing Installed Libraries
To see all installed libraries for a specific (module) version of R type:
To limit your view to user installed packages, type:
Suppressing output
R per default writes a lot of messages to stdout
. Using the library sink
this can be suppressed and with SLURM the output can be redirected to a file.
sink
is used as follows:
Compiling functions
Compiling self-declared functions with the compiler
library can help to speed up code enormously. This holds, for lengthy functions and numerical code. It does not for already defined functions within R.
There also is the option of a just in time compiler in this library:
The numeric argument in enableJIT
specifies the “level” of compilation. At level 2 and 3 more functions are compiled. In a few rare cases enableJIT
will slow down code, particularly, if most of it is already pre-compiled or written in C, and/or if the code creates functions repeatedly, which then need to be compiled every time. This is more likely to happen with enableJIT(2)
or enableJIT(3)
, though these have the potential to speed up code more, as well.
Bioconductor
Bioconductor
is a versatile tool package for handling biological data based upon the R statistical programming language. By loading the module R/3.1.2
or newer versions Bioconductor with MPI support becomes available. Missing Bioconductor packages can be installed or updated upon request, albeit not in a rapid fashion.
Please ensure that the appropriate packages are loaded, too - before submitting your job.
Submitting R Scripts
There are two ways to submit scripts in interpreted languages:
Submitting using another script
The script can be in any other scripting language (e.g. bash) by invoking $ sbatch <jobscript>
with a jobscript like:
Note the --no-save
-flag. This prevents name space clashes, if you have used R previously in different scenarios.
OMP_NUM_THREADS
: On MOGON R is linked against OpenBlas. When unlimited OpenBlas tries to create as many threads as cores in our environment. If you create 64 (on MOGON I) instances of R, like in the example below, there will be $64\times 64$ threads. This is not meaningful, would slow the computation tremendously and impossible. Therefore, when calculated bigger matrices, there should be fewer instances than cores of R, but OMP_NUM_THREADS
can be set to higher values, such that OMP_NUM_THREADS
* No. of R-tasks = number of cores.Submitting using R
Here, the problem lies within the way the shell treads the interpreted scripts: programmers have to supply a
shebang
along with a fully qualified path to the interpreter. So when loading a given R-module the which
command gives you the desired path:
From hereon the submission header is analogous to the standard one in bash.
Using R in parallel
R offers several packages for parallel or HP-computing . Rmpi and snow probably are the most common packages for that purpose in conjunction with MPI (the snow link contains a bunch of examples).
Rmpi has been installed with various R-modules. Loading the R-module issue a warning that the respective MPI module has to be loaded. Ignore this warning, if you are not going to use MPI.
In addition to Rmpi and snow (see below) we provide the dompi package . See below for an example.
For the following example we will look at the snow library .
A simple socket cluster
At first we will look into an example confined to a single node. This script will work on many files named *.in
and is to be submitted with the shell script example from above.
Using Rmpi & snow (sample script)
The following sample script shows how to use the packages Rmpi and snow. Note that the modules used to compile Rmpi have to be loaded prior to submitting the script or else the script will crash.
The script uses a callback function, within this function you should place your code to be parallized. Such function can, of course, be called multiple times and should be placed outside the actual clusterCall()
-function.
This script is placed and edited courtesy of W. Spitz (University of Bonn) :
In order to submit parallel R scripts using Rmpi, invoke $ sbatch <jobscript>
with a jobscript like:
Note the --no-save
-flag. This prevents name space clashes, if you have used R previously in different scenarios.
dompi
This example script shows the simple usage:
It can be started with $ sbatch jobscript.sh
, like in the example above.
Dynamic Parallelization
Dynamic Parallelization is offered by the package parallelize.dynamic . Apart from the documentation on the project site, a paper described its setup.