Archiving
Meeting the Archiving Requirement
Research projects are required to archive their data (raw data, results, software workflows, etc.) by their respective funding organizations. In addition, HPC projects may not expect long time storage of their research data on HPC filesystems and are hence advised to utilize the ZDV data management facilities.
Automatically set Metadata
With every archiving act the following metadata are automatically associated with your data set:
Creator
- the full user name of the person archiving the data in questionPublisher
- always set to “Johannes Gutenberg-University”Location
- set to “Mainz, Germany”Date
- is the data of archiving act and contains the Unix timestamp of the actExpiryDate
- isDate
+ 10 yearsprotected
- per default this property is set to “false”, which means that data can still be changed (e.g. more data added to a collection)
Meta Data Stewardship with Schemas
In order to facilitate populating iRODS collections with meta data, according to schemas we provide a helper module.
You can create a schema file with an online tool:
JSON-Schemas to iRODS
Loading the module tools/imcs
will provide a script which can be called like:
Preparing to archive
We suggest to compress and annotate data prior to archiving with the iRODS archive:
- compressing saves transfer time
- annotation eases the interpretation of retrieved data (if an archive needs to be pulled back).
Compressing Directories
A smaller directory can be compressed in the standard way:
You may speed-up the compression, on a login-node using a parallel compression tool like pigz
:
If the directory you are working on is too big, you can run an interactive job, too: