Julia
is a high-performance, high-level, dynamic programming language.
Distinctive aspects of Julia’s design include a type system with parametric polymorphism in a dynamic programming language; with multiple dispatch as its core programming paradigm. Julia supports concurrent, (composable) parallel and distributed computing (with or without using MPI and/or the built-in corresponding to “OpenMP-style” threads), and direct calling of C and Fortran libraries without glue code. Julia uses a just-in-time (JIT) compiler that is referred to as “just-ahead-of-time” (JAOT) in the Julia community, as Julia compiles all code (by default) to machine code before running it.
Initially, you need to load a Julia Module on a MOGON service-node, e.g. with:
For package installation, we will use the REPL (read-eval-print loop) that comes built-in to the julia executable. Start Julia by using:
from the command line. Now start the Pkg REPL (Pkg also comes with a REPL) by pressing ], upon entering the Pkg REPL, the command line prompt should like:
We will use the package Dates (
Dates
) to illustrate what the general procedure for installing packages is:
you should get an output similar to:
Let’s check the successful installation: First we display the status of the packages in our standard project:
Depending on which packages you have already installed, you should get an output similar to the following:
This indicates that Dates has been successfully installed, but let’s check and confirm this with Pkg
The installation was sucessful and Dates can now be included in .jl file via the following line:
You can now add any packages from the Standard Library to Julia, but please note the following:
Package Dependencies
Some Julia packages require you to load pre-requisite dependencies as modules before you can add the via Pkg.add!
This is also illustrated again by the following examples CUDA and Plots.
In the Pkg REPL you have the following commands available to manage packages:
Command
Result
Comment
add
Adds a Package
It is possible to ad multiple packages in one command add A B C
test
Run Tests for a Package
build
Explicitly run the build step for a Package
build is automatically run when a package is first installed with add
rm
Removes Package
Removes only that package in the project. Tto remove a package that only exists as a dependency use rm --manifest DepPackage. Note that this will remove all packages that depend on DepPackage.
up
Update Package and all dependencies
The Flag --minor updates only the minor version of packages reducingthe risk to break projects.
Julia’s
CUDA.jl
package is the main entrypoint for programming on NVIDIA GPUs using CUDA. The Julia CUDA stack requires a functional CUDA-setup, which includes the NVIDIA Driver and the corresponding CUDA toolkit. These are either available as module or already integrated into the MOGON GPU nodes. To be able to use CUDA.jl with Julia, you only need to proceed as follows.
Log in to MOGON and load the following modules on a service-node first:
Now start Julia with the following command
and then change to the Pkg REPL with ], the command line prompt should now look like:
We are now ready to add CUDA via
You are now ready to use CUDA.jl with Julia on MOGON. Take a look at our GPU section below for easy approach to CUDA.jl on MOGON.
Plots.jl
is a visualization interface and toolset for powerful visualizations in Julia. Here we explain how to add the Plots.jl package to Julia and set up the backend PyPlot.
Log in to MOGON and load the following modules on a service-node first:
Open Julia by executing the following command after the modules have been successfully loaded
now enter the Pkg REPL by pressing ], the command line prompt should look like:
First, the actual packages are added and then the backend is configured. Install Plots.jl with:
now set the backend to pyplot with:
Afterwards, test the successful installation with:
An overview of available packages for Julia can be found in the
Julia Documentation
and
JuliaLang Github Repo
. The most noteworthy packages are (aka. you will probably install them at some point):
Julia offers
two main possibilities
for parallel computing: A multi-threading based parallelism, which basically is a shared memory parallelism and distributed Processing, which parallelizes code across different Julia processes.
The number of execution threads is controlled either by using the -t/--threads command line argument or by using the JULIA_NUM_THREADS environment variable. When both are specified, then -t/--threads takes precedence.
Starting with julia -p n provides n worker processes on the local machine. Generally it makes sense for n to equal the number of CPU threads (logical cores) on the machine. Note that the -p argument implicitly loads module Distributed Julia Documentation,
First, Julia must be configured for the use of MPI. For this purpose the
MPI Wrapper
for Julia is used.
Log in to one ouf our Service-Nodes and then load Julia and the desired MPI module via:
Next, you need to build the MPI package for Julia with Pkg:
The output should be similar to the following if installation and build was successful:
Before you start parallelising with Julia on MOGON GPUs, you need to prepare your Julia environemnt for the usage pf GPUs, as we explaind earlier in the Article about CUDA.jl. After successfully setting up CUDA.jl, you can directly start utilising the advantages of GPUs. We have given some examples below to make it easier for you to start using Julia on MOGON GPUs and to somewhat reflect the advantages of GPUs.
The test estimates how fast data can be sent to and read from the GPU. Since the GPU is plugged into a PCI bus, this largely depends on the speed of the PCI bus as well as many other factors. However, there is also some overhead included in the measurements, in particular the overhead for function calls and array allocation time. However, since these are present in any “real” use of the GPU, it is reasonable to include them. Memory is allocated and data is sent to the GPU using Julia’s CUDA.jl package. Memory is allocated and data is transferred back to CPU memory using Julia’s native Array() function.
The theoretical bandwidth per lane for PCIe 3.0 is $0.985 GB/s$. For the GTX 1080Ti (PCIe3 x16) used in our MOGON GPU nodes the 16-lane slot could theoretical give $15.754 GB/s$[^1].
The job script is pretty ordinary. In this example, we use only one GPU and start Julia with four threads. To do this, we request one process with four cpus for multithreading:
The job is submitted with the following command
The job will be finished after a few minutes, you can view the output as follows:
The output should be similar to the following lines:
The Julia script also generates a plot, which we would like to show here:
You might be familiar with this example if you stumbled upon our MATLAB article or read it on purpose. At this point we would simply like to restate what we originally took from the
MATLAB Help Center
:
For operations where the number of floating-point computations performed per element read from or written to memory is high, the memory speed is much less important. In this case the number and speed of the floating-point units is the limiting factor. These operations are said to have high “computational density”.
A good test of computational performance is a matrix-matrix multiply. For multiplying two $N times N$ matrices, the total number of floating-point calculations is
$$ FLOPS(N) = 2N^3 - N^2 $$
Two input matrices are read and one resulting matrix is written, for a total of $3N^2$ elements read or written. This gives a computational density of $(2N - 1)/3$ FLOP/element. Contrast this with plus as used above, which has a computational density of $1/2$ FLOP/element.
The difference to our MATLAB article is of course the adaptation to native Julia code. But even so, we have made a few alterations due to the use of the Julia language. When defining vectors or arrays, we have purposely chosen Float32, since GPUs are faster when working with Float32 than with Float64. In addition CuArrys are by default Float32, as well as functions like CUDA.rand or CUDA.zeros.
The job script is quite ordinary. In this example, we only use one GPU and start Julia with four threads. For this we request one process with four CPUs for multithreading.
You can submit the job by simply executing:
The job will be completed after acouple of minutes and you can view the output with:
The Output should resemble the following lines:
The graphic generated in the script is shown below: