Deploying a GlusterFS distributed storage for efficient checkpointing

Background
- Why GlusterFS?
- Volume types and redundancy
Steps
Costs
Prepare the environment
Create a GlusterFS volume
Set up client VMs
How to test the volume
- Checking that the storage is available
- Running the IOR benchmark test
How to delete the resources you created

In this tutorial, you will use Compute Cloud virtual machines in Nebius AI to set up a GlusterFS distributed storage. GlusterFS can scale to several petabytes and handle thousands of clients, and you can use it for efficient checkpointing in your model training setup, among other use cases.

Background

This section provides useful background information. To get started with the tutorial straight away, go to Steps.

Why GlusterFS?

GlusterFS, or Gluster, is an open source, distributed file system with a modular, stackable design, and a unique no-metadata server architecture which ensures better performance, linear scalability, and reliability.

GlusterFS is particularly useful for checkpointing in model training. It ensures high bandwidth when multiple training nodes write and read the same checkpoint files at the same time, in parallel; the more training and storage nodes, the better. The results of IOR benchmark tests (documentation, GitHub) conducted by Nebius AI reflect that:

Setup	Write bandwidth, MiB/s	Read bandwidth, MiB/s
10 client VMs (4 vCPUs, 8 GB RAM) 10 storage VMs (8 vCPUs, 8 GB RAM, high-perfomance SSDs)	2,063	1,918
10 client VMs (4 vCPUs, 8 GB RAM) 10 storage VMs (8 vCPUs, 8 GB RAM, network SSDs)	1,996	2,026
10 client VMs (40 vCPUs, 120 GB RAM) 10 storage VMs (8 vCPUs, 8 GB RAM, high-perfomance SSDs)	3,191	5,640
30 client VMs (40 vCPUs, 120 GB RAM) 30 storage VMs (8 vCPUs, 8 GB RAM, high-perfomance SSDs)	10,316	14,001

Volume types and redundancy

A GlusterFS volume is a logical collection of directories (bricks) on storage VMs that store your files. Depending on the volume type, GlusterFS can distribute, replicate, or disperse files across the storage VMs. For more details on volume types, see GlusterFS documentation.

In this tutorial, you will create a distributed volume by default. It means that each file is stored on one of the storage VMs that belong to the volume. There is no software-level redundancy: if a storage VM becomes unavailable, so do the files stored on it. Hardware-level redundancy is still ensured by underlying Nebius AI disks.

You will be able to change the volume type at a certain point of the tutorial, before the volume is created.

Steps

In this tutorial, you will:

Create a GlusterFS volume using a ready-made Terraform configuration.
Set up client VMs: install GlusterFS client and mount the volume.

Also, the tutorial covers optional steps of testing the volume and deleting the created resources.

Costs

The cost of this infrastructure includes:

Fees for continuously running VMs and disks (see Compute Cloud pricing).
Fees for using public IP addresses and outgoing traffic (see Virtual Private Cloud pricing).

Prepare the environment

If you do not have the Nebius AI command line interface yet, install and configure it.
Install Terraform and configure Nebius AI Terraform provider.

Warning

Nebius AI Terraform provider is in beta and may be unstable. If you are experiencing issues with it, contact support.
Create an SSH key pair:
```
ssh-keygen -t ed25519
```
We recommend leaving the key file name unchanged.

Create a GlusterFS volume

On your local machine:

Clone the nebius-architect-solution-library repository from GitHub and go to the glusterfs-cluster-ubuntu directory:

git clone https://github.com/nebius/nebius-architect-solution-library.git
cd ./nebius-architect-solution-library/glusterfs-cluster-ubuntu

See the variables.tf file for the list of variables and their default values used in the configuration. To override the default values, create a terraform.tfvars file using the example below and modify it with your values:
```
# Number of GlusterFS server VMs
storage_node_per_zone = 5  # At least 2

# Storage disks on server VMs (in addition to boot disks)
disk_count_per_vm = 2
disk_type         = "network-ssd"  # "network-ssd-io-m3" - similar performance
disk_size         = 512  # In GiB

# Computing resources of server VMs
storage_cpu_count    = 12
storage_memory_count = 16  # In GB

# SSH public key for server VMs
local_pubkey_path = "../id_ed25519.pub"
```
For details about disk types, see Disks.

To keep the default value for a variable, remove its line from terraform.tfvars.

Note

The configuration in this tutorial uses a distributed GlusterFS volume, which is the default type: each file is stored only on one storage VM, and there is no redundancy at software level. If you need another type of volume, look for the gluster volume create command in metadata/cloud-init.yaml and modify it according to GlusterFS documentation.
Apply the configuration:
1. Initialize Terraform:
```
terraform init
```
2. Check the Terraform file configuration:
```
terraform validate
```
3. Check the list of created cloud resources:
```
terraform plan
```
4. Create resources:
```
terraform apply
```

This will create the VMs for distributed data storage (glusterfs01, glusterfs02...). They will have GlusterFS servers installed and a distributed volume set up.

Set up client VMs

This section assumes that you have created multiple client VMs: virtual machines that you will connect to the GlusterFS storage. If you have not created them yet, follow this guide.

Warning

The client VMs must be in the same default-eu-north1-c subnet as the storage VMs.

To set up all client VMs, perform the following steps for each VM:

Get the VM's public IP address:
```
ncp compute instance get <VM's_name_or_ID>
```
The address will be shown in the network_interfaces.primary_v4_address.one_to_one_nat.address field.

Connect to the VM over SSH:

ssh <username>@<VM's_public_IP_address>

Run the superuser shell:
```
sudo -i
```

Install the GlusterFS client:

apt-get update && apt-get install glusterfs-client

Create the /mnt/glusterfs directory to mount the volume into:
```
mkdir /mnt/glusterfs
```

Mount the volume:

mount -t glusterfs gluster01:/stripe-volume /mnt/glusterfs

To keep the volume mounted after restarts, add it to /etc/fstab:

echo "gluster01:/stripe-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0" >> /etc/fstab

Now you can run your model training workload on client VMs and use /mnt/glusterfs on all of them to write and read checkpoint files.

How to test the volume

Checking that the storage is available

While connected to client01 over SSH:

Create a text file:

cat > /mnt/glusterfs/test.txt <<EOF
Hello, GlusterFS!
EOF

Make sure that the file is available on all client VMs:

clush -w @clients sha256sum /mnt/glusterfs/test.txt

Result:

client01: 878fd15130e712e21fb35ec0978cb7194abc54a465848ff28356d10c0f79fdb4  /mnt/glusterfs/test.txt
client02: 878fd15130e712e21fb35ec0978cb7194abc54a465848ff28356d10c0f79fdb4  /mnt/glusterfs/test.txt
...

Running the IOR benchmark test

While connected to client01 over SSH:

Install IOR:

Install dependencies on all client VMs in parallel:

clush -w @clients sudo apt-get install autoconf pkg-config libtool mpich make

Clone the IOR repository into the volume:

cd /mnt/glusterfs
git clone https://github.com/hpc/ior.git

Create a directory to build IOR into:
```
cd ior
mkdir prefix
```
Generate the IOR configuration script:
```
./bootstrap
```

Run the script:

./configure --disable-dependency-tracking --prefix /mnt/glusterfs/ior/prefix

Build and install IOR:
```
make
make install
```

Set up your environment for the test:
1. Create a directory for the files that will be written and read during the test:
```
mkdir -p /mnt/glusterfs/benchmark/ior
```
2. Create an environment variable with a comma-delimited list of the client VMs' hostnames. For example, if your VMs are called client01 to client05, their default hostnames are client01.eu-north1.internal to client05.eu-north1.internal, and the variable can be created like this:
```
export GLUSTERFS_NODES=$(seq -f 'client%02g.eu-north1.internal' -s ',' 1 5)
```
Run the test:
```
mpirun -hosts $GLUSTERFS_NODES -ppn 16 \
  /mnt/glusterfs/ior/prefix/bin/ior \
  -o /mnt/glusterfs/benchmark/ior/ior_file -F \
  -t 1m -b 16m -s 16 -C
```
The test writes and reads files in the GlusterFS volume with the following parameters:
- -ppn 16: 16 processes on each VM run the test in parallel.
- -o /mnt/glusterfs/benchmark/ior/ior_file -F: Each process sequentially writes to its own file, /mnt/glusterfs/benchmark/ior/ior_file.<process_number>; this is called "file-per-process" (-F).
- -t 1m -b 16m: Write and read operations are performed in 1 MiB (2²⁰ bytes) transfers and 16 MiB blocks.
- -s 16: Each file consists of 16 one-block segments. For 5 client VMs, the total file size is 16 MiB × 16 segments × 16 processes × 5 VMs = 20 GiB.
- -C: Instead of reading from its own file, each process reads from a file written by a neighboring client VM to bypass the page cache. This ensures that IOR measures the performance of the file system, not the VM's memory.
For more details, see the articles about the first steps with IOR and its options in its documentation.

How to delete the resources you created

While the storage VMs are running, you are charged for them. If you do not need them anymore, delete them.

To delete the storage VMs, on your local machine, go to the directory with the Terraform configuration (see Set up a GlusterFS volume) and run the following command:

terraform destroy