Lustre

Lustre is a high-performance storage system designed to provide fast, parallel access to large datasets across many compute nodes.

Projects on MOGON get access to the Lustre project fileserver if it is requested during the application process.

Did you know?

There is no backup on any of our Lustre file servers. They are not an archive system. So please remove data that is no longer needed from the file servers.

Architecture

Unlike a typical network filesystem, Lustre distributes both metadataData about your actual data, like the file name, file size, owner and permissions, timestamps, directory location, and striping information. and file data across multiple servers, allowing many users and jobs to read and write simultaneously at high speed.

The actual data within a file is stored at a block level on the object storage target (OST), while the metadata target (MDT) holds the metadata to a file. The Lustre filesystem consists of multiple OSTs and at least one MDT. A server providing the MDTs is called metadata server (MDS), while a server providing the OSTs is called object storage server (OSS).

Lustre Usage Guideline

Best Performance

Lustre performs best when your workload has the following characteristics:

  1. High-throughput data movement

    • Reading/writing large files (GB–TB)
    • Sustained, continuous I/O (streaming)
  2. Parallel access

    • Many processes accessing different parts of the same file
    • MPI jobs using coordinated I/O
    • Parallel jobs (MPI, multi-node)
  3. Sequential access patterns

    • Reading or writing data in order
    • Long, contiguous operations rather than small chunks

Bad Matchup

On the other hand, you should avoid (or rethink) the following workflows since these patterns often lead to poor performance:

  1. Many small files

    • Thousands to millions of files (<1 MB)
    • Frequent file creation/deletion

    Why it’s bad: Heavy load on metadata servers → bottlenecks

  2. Small, frequent I/O operations

    • Writing/reading tiny chunks (KB-scale)
    • Repeated open/write/close cycles

    Why it’s bad: High overhead per operation → low effective throughput

  3. Random I/O patterns

    • Jumping to different offsets in a file
    • Non-sequential reads/writes

    Why it’s bad: Breaks read-ahead and parallel efficiency → slower access

  4. Serialized access to shared files

    • Many processes writing to the same region of a file
    • Lock contention or uncoordinated writes

    Why it’s bad: Eliminates parallelism → creates bottlenecks

  5. Metadata-heavy workloads

    • Scanning huge directories (ls on millions of files)
    • Constant file status checks

    Why it’s bad: Metadata servers become the limiting factor

Striping

In order to leverage Lustre’s capabilities of parallelization, it is possible to store single files across several OSTs. This is called striping and enables a file to be read from multiple sources in parallel, reducing both access times and bandwidth for very large files.

You can find out the striping pattern of a file or directory using:

lfs getstripe /lustre/project/<path-to-my-data>

The default employed on MOGON is stripe_count=1 and stripe_size=1MiB, which means that each file is stored on one OST only, there is no parallel distribution across storage targets. This is a design choice for general stability. To tune performance, users can control exactly how their files are laid out. The following command lets you set the stripe countHow many OSTs your file uses. and stripe sizeSize of each chunk:

lfs setstripe -S <stripe-size> -c <stripe-count> /lustre/project/<path-to-my-data>
lfs setstripe -S 16M -c 4 /lustre/project/nhr-project/janedoe/data

Should I Change the Striping?

You should rewrite the layout if:

  • Large sequential I/O performance matters (e.g., your application writes big contiguous chunks, tens or hundreds of MB per write)
  • You want higher stripe count (e.g., you actually have concurrent writers and the file is large enough to benefit)
  • You’re optimizing for checkpoint throughput (e.g., higher and more stable sustained bandwidth)

Bad reasons would be:

  • Just because the defaults look odd
  • Small files (<100 MB usually)
  • Mixed/random I/O workloads

You can consider these stripe sizes as a rule of thumb:

Workload Typical Stripe Size
Small Files $\space\space1\thinspace\text{M}\space$–$\space\space4\thinspace\text{M}$
General HPC $\space\space4\thinspace\text{M}\space$–$\space16\thinspace\text{M}$
Large Sequential I/O $\space16\thinspace\text{M}\space$–$\space64\thinspace\text{M}\thinspace\text{+}$
Network Filesystems

Here you can find a short overview of the MOGON home and project directory, including quotas.

Local Node Storage

One solution for temporary small files could be writing to the local storage of compute nodes.