Anaconda

Create your own work environments with Anaconda

Note, this page requires further review and polishing. It is work in progress. Feedback is appreciated.

Anaconda is a packaging system for software and eases the deployment of (statically) compiled software. It can be used to build your own development and research environment. This gives you the opportunity to download and use the software you need without needing an admin to install it on the cluster.

In the course of time several build systems derived directly from (Ana-)Conda, where Conda is the most prominent one and particularly well established in bioinformatics . Conda is entirely written in Python and executes rather slowly. A C++ implementation, which is considerably faster, is Mamba as it searches its repositories in parallel. All these solutions come with a serious caveat: Many files.

Micromamba

To address this issue, we recommend using Micromamba on our HPC facilities. The benefit: No more package files which needs manual cleanup. The drawback: Whereas Mamba is a dropin command for Conda (that is instead of typing conda you might use mamba, mirocmamba needs is full name micromamba whenever you use it (see below). In the following we give some hints about using Micromamba. You may still refer to the conda cheat sheet as it is very useful for beginners - just remember that micromamba works slightly differntly.


Setting up Micromamba for the first time

First of all, we need to install Micromamba. It is adviced to install it in the home directory. To do so, execute:

cd ~ && "${SHELL}" <(curl -L micro.mamba.pm/install.sh)

During this install process you are asked several questions. We recommend confirming them all.

Seperating Conda/Micromamba Environments from the Module Environments

After installing Conda or Micromamba, you will find an entry in the your .bashrc like this:

# >>> mamba initialize >>>
# !! Contents within this block are managed by 'mamba init' !!
export MAMBA_EXE='/gpfs/fs1/home/username/.local/bin/micromamba';
export MAMBA_ROOT_PREFIX='/gpfs/fs1/home/username/micromamba';
__mamba_setup="$("$MAMBA_EXE" shell hook --shell bash --root-prefix "$MAMBA_ROOT_PREFIX" 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__mamba_setup"
else
    alias micromamba="$MAMBA_EXE"  # Fallback on help from mamba activate
fi
unset __mamba_setup

When you wrap this code in a (bash) function, like this (pay attention to the first and last line):

function conda_initialize {
# >>> mamba initialize >>>
# !! Contents within this block are managed by 'mamba init' !!
export MAMBA_EXE='/gpfs/fs1/home/username/.local/bin/micromamba';
export MAMBA_ROOT_PREFIX='/gpfs/fs1/home/username/micromamba';
__mamba_setup="$("$MAMBA_EXE" shell hook --shell bash --root-prefix "$MAMBA_ROOT_PREFIX" 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__mamba_setup"
else
    alias micromamba="$MAMBA_EXE"  # Fallback on help from mamba activate
fi
unset __mamba_setup
}

You need to call the function conda_initialize every time you want to use micromamba. This avoids the (potentially overlong) execution upon every login and you can initialize micromamba in job script separately, once using the micromamba environment(s), once using module environments.

Note that mixing Conda/Micromamba environments with module environments is prone to errors!

Configuring Conda/Micromamba for better User Experience

Conda offers you to configure it with a file called ~/.condarc to avoid too many questions, warnings or even error messages during install processes. Also, it speeds up the search for software packages. Micromamba adheres to this configuration. To configure your environment open the configuration file with an editor, e.g.:

nano ~/.condarc

and insert this file content:

create_default_packages:
  - setuptools
channels:
  - conda-forge
  - bioconda
  - defaults
  - r
proxy_servers:
  http: http://webproxy.zdv.uni-mainz.de:8888
ssl_verify: false
channel_priority: strict
always_yes: true 
env_prompt: '($(basename {default_env}))'

Using this configuration file you achieve:

  • installing the Python setup tools per default (such that internal pip installs work)
  • to restrict searching in the most prominent repository channels (add further channels, if desired)
  • the setting of the proxy server (without it, conda will not find any software)
  • to ignore checking for ssl certificates, as the local setup redirects https to http anyways
  • to check the channels in the order they are listed
  • not to ask for confirmation upon install requests
  • and finally, the last line ensures that the display of unnamed environments (see below) does to overextend your terminal.

You may use different resource file settings, see the documentation for further hints on the configuration.

Using Micromamba

Firt Time Users

If you are working in the login shell you have been using with the curl-command, you might need to source your .bashrc file. In this case execute:

source ~/.bashrc

Whenever you have a new shell or you log in again, this is automatically exectuted and not needed any more.

If you added the function above you need to run it to initialize micromamba:

conda_initialize

Wheras Conda would display a so-called (base) environment. Micromamba does not create such a base environment, avoiding the duplication of files from base to other environments.

Using Environments

We recommend installing software in bundles, so-called environments. It is best to have one environment per workflow.

In order to set up an environment you may choose from

  • installing a (named) environment in your HOME directory. This might work for singly users, but carries the risk of hitting the file quota limit in your HOME
  • or in your project directory. This has the drawback of being an unnamed environment, but may serve entire groups.

To create a named environment in your HOME run:

micromamba create -n <environment name> 

If you choose to create an environment in your project file space run:

micromamba create -p /lustre/project/<project name>/<path to your environment>
# if you want to ensure group-wide read access run
chmod g+r /lustre/project/<project name>/<path to your environment>
# if you want to ensure group-wide write acess, e.g. to install further
# software, run:
chmod g+w /lustre/project/<project name>/<path to your environment>
Note, if you allow write access to an environment in your project folder you need to ensure that no concurrent install attempts take place!

Other usefull flags for creating environments are:

  • --pyc to automatically create Python byte code upon installation (avoiding this step at later times and potentially doubling the file count)
  • -f to indicate a yaml or txt file specifiying a sofware list.

After creating one or several environments micromamba is able to list them:

micromamba env list
Activating Environments

In order to activate an environment, that is to make the installed software available to you, run either:

# for a named environment
micromamba activate <environment name>
# or - for a path -
micromamba activate /lustre/project/<project name>/<path to your environment>

You will see your environment (respectively the base name) in your prompt.

Likewise, to leave your environment run:

micromamba deactivate
Adding Software to Environments (Installing additional Packages)

Make sure to activate an environment before installing packages.

You can search for a particular software name with

micromamba search <name>

To install a single software package you may run

micromamba install <result from search>
# or a specific version
micromamba install <result from search>=<version>

To install packages of software workflow developers usually provide text or yaml files. With a given file, e.g.

name: samtools  # name for your environment
channels:       # package sources
  - conda-forge
  - bioconda
dependencies:   # applications to be installed
  - python=3.7.3
  - samtools=1.9

you can install this bundle with the -f flag upon creating your environment or like:

micromamba install -f <file name, yaml or plain text line, by line>
# or (necessary for unnamed environments, because of the abbreviated path in the prompt)
micromamba install -f <file name, yaml or plain text line, by line> -p <path to environment>