Archiving

Meeting the Archiving Requirement

Research projects are required to archive their data (raw data, results, software workflows, etc.) by their respective funding organizations. In addition, HPC projects may not expect long time storage of their research data on HPC filesystems and are hence advised to utilize the ZDV data management facilities.

Automatically set Metadata

With every archiving act the following metadata are automatically associated with your data set:

Creator - the full user name of the person archiving the data in question
Publisher - always set to “Johannes Gutenberg-University”
Location - set to “Mainz, Germany”
Date - is the data of archiving act and contains the Unix timestamp of the act
ExpiryDate - is Date + 10 years
protected - per default this property is set to “false”, which means that data can still be changed (e.g. more data added to a collection)

Meta Data Stewardship with Schemas

In order to facilitate populating iRODS collections with meta data, according to schemas we provide a helper module.

You can create a schema file with an online tool:

BioSchemas Markup Generator

JSON-Schemas to iRODS

Loading the module tools/imcs will provide a script which can be called like:

schema2avu -j <json_file> -c <iRODS-path to iRODS collection>

Currently no nested schemas for complex data are supported. As this nesting might be data specific, you may approach the HPC team to include the necessary feature for your specific data.

Preparing to archive

We suggest to compress and annotate data prior to archiving with the iRODS archive:

compressing saves transfer time
annotation eases the interpretation of retrieved data (if an archive needs to be pulled back).

Compressing Directories

A smaller directory can be compressed in the standard way:

# assuming gzip compression
tar -czf <archivename>.tar.gz <directoryname>

You may speed-up the compression, on a login-node using a parallel compression tool like pigz:

module load tools/pigz
tar cf - <directoryname> | pigz -p 4 > <archivename>.tar.gz

If the directory you are working on is too big, you can run an interactive job, too:

module load tools/pigz
# an interactive job might look like:
srun -A <your account> -p parallel -C broadwell -t <appropriate time> -N 1 -c40 --pty bash -i
  <some node>:$ tar -I pigz -cf <archivename>.tar.gz <directoryname>

Last updated on May 8, 2025 from Tretyakov, Nikita

Local Scratch

iRODS

Docs

Title here

Archiving

Meeting the Archiving Requirement

Automatically set Metadata

Meta Data Stewardship with Schemas

JSON-Schemas to iRODS

Preparing to archive

Compressing Directories

Archiving

Meeting the Archiving Requirement#

Automatically set Metadata#

Meta Data Stewardship with Schemas#

JSON-Schemas to iRODS#

Preparing to archive#

Compressing Directories#

Meeting the Archiving Requirement

Automatically set Metadata

Meta Data Stewardship with Schemas

JSON-Schemas to iRODS

Preparing to archive

Compressing Directories