Deploying a GlusterFS distributed storage for efficient checkpointing
In this tutorial, you will use Compute Cloud virtual machines in Nebius AI to set up a GlusterFS
Background
This section provides useful background information. To get started with the tutorial straight away, go to Steps.
Why GlusterFS?
GlusterFS
GlusterFS is particularly useful for checkpointing in model training. It ensures high bandwidth when multiple training nodes write and read the same checkpoint files at the same time, in parallel; the more training and storage nodes, the better. The results of IOR benchmark tests (documentation
Setup | Write bandwidth, MiB/s | Read bandwidth, MiB/s |
---|---|---|
10 client VMs (4 vCPUs, 8 GB RAM) 10 storage VMs (8 vCPUs, 8 GB RAM, high-perfomance SSDs) |
2,063 | 1,918 |
10 client VMs (4 vCPUs, 8 GB RAM) 10 storage VMs (8 vCPUs, 8 GB RAM, network SSDs) |
1,996 | 2,026 |
10 client VMs (40 vCPUs, 120 GB RAM) 10 storage VMs (8 vCPUs, 8 GB RAM, high-perfomance SSDs) |
3,191 | 5,640 |
30 client VMs (40 vCPUs, 120 GB RAM) 30 storage VMs (8 vCPUs, 8 GB RAM, high-perfomance SSDs) |
10,316 | 14,001 |
Volume types and redundancy
A GlusterFS volume is a logical collection of directories (bricks) on storage VMs that store your files. Depending on the volume type, GlusterFS can distribute, replicate, or disperse files across the storage VMs. For more details on volume types, see GlusterFS documentation
In this tutorial, you will create a distributed volume by default. It means that each file is stored on one of the storage VMs that belong to the volume. There is no software-level redundancy: if a storage VM becomes unavailable, so do the files stored on it. Hardware-level redundancy is still ensured by underlying Nebius AI disks.
You will be able to change the volume type at a certain point of the tutorial, before the volume is created.
Steps
In this tutorial, you will:
- Create a GlusterFS volume using a ready-made Terraform configuration.
- Set up client VMs: install GlusterFS client and mount the volume.
Also, the tutorial covers optional steps of testing the volume and deleting the created resources.
Costs
The cost of this infrastructure includes:
- Fees for continuously running VMs and disks (see Compute Cloud pricing).
- Fees for using public IP addresses and outgoing traffic (see Virtual Private Cloud pricing).
Prepare the environment
-
If you do not have the Nebius AI command line interface yet, install and configure it.
-
Install Terraform and configure Nebius AI Terraform provider.
Warning
Nebius AI Terraform provider is in beta and may be unstable. If you are experiencing issues with it, contact support.
-
Create an SSH key pair:
ssh-keygen -t ed25519
We recommend leaving the key file name unchanged.
Create a GlusterFS volume
On your local machine:
-
Clone the nebius-architect-solution-library
repository from GitHub and go to theglusterfs-cluster-ubuntu
directory:git clone https://github.com/nebius/nebius-architect-solution-library.git cd ./nebius-architect-solution-library/glusterfs-cluster-ubuntu
-
See the variables.tf
file for the list of variables and their default values used in the configuration. To override the default values, create aterraform.tfvars
file using the example below and modify it with your values:# Number of GlusterFS server VMs storage_node_per_zone = 5 # At least 2 # Storage disks on server VMs (in addition to boot disks) disk_count_per_vm = 2 disk_type = "network-ssd" # "network-ssd-io-m3" - similar performance disk_size = 512 # In GiB # Computing resources of server VMs storage_cpu_count = 12 storage_memory_count = 16 # In GB # SSH public key for server VMs local_pubkey_path = "../id_ed25519.pub"
For details about disk types, see Disks.
To keep the default value for a variable, remove its line from
terraform.tfvars
.Note
The configuration in this tutorial uses a distributed GlusterFS volume, which is the default type: each file is stored only on one storage VM, and there is no redundancy at software level. If you need another type of volume, look for the
gluster volume create
command in metadata/cloud-init.yaml and modify it according to GlusterFS documentation . -
Apply the configuration:
-
Initialize Terraform:
terraform init
-
Check the Terraform file configuration:
terraform validate
-
Check the list of created cloud resources:
terraform plan
-
Create resources:
terraform apply
-
This will create the VMs for distributed data storage (glusterfs01
, glusterfs02
...). They will have GlusterFS servers installed and a distributed volume set up.
Set up client VMs
This section assumes that you have created multiple client VMs: virtual machines that you will connect to the GlusterFS storage. If you have not created them yet, follow this guide.
Warning
The client VMs must be in the same default-eu-north1-c
subnet as the storage VMs.
To set up all client VMs, perform the following steps for each VM:
-
Get the VM's public IP address:
ncp compute instance get <VM's_name_or_ID>
The address will be shown in the
network_interfaces.primary_v4_address.one_to_one_nat.address
field. -
Connect to the VM over SSH:
ssh <username>@<VM's_public_IP_address>
-
Run the superuser shell:
sudo -i
-
Install the GlusterFS client:
apt-get update && apt-get install glusterfs-client
-
Create the
/mnt/glusterfs
directory to mount the volume into:mkdir /mnt/glusterfs
-
Mount the volume:
mount -t glusterfs gluster01:/stripe-volume /mnt/glusterfs
-
To keep the volume mounted after restarts, add it to
/etc/fstab
:echo "gluster01:/stripe-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0" >> /etc/fstab
Now you can run your model training workload on client VMs and use /mnt/glusterfs
on all of them to write and read checkpoint files.
How to test the volume
Checking that the storage is available
While connected to client01
over SSH:
-
Create a text file:
cat > /mnt/glusterfs/test.txt <<EOF Hello, GlusterFS! EOF
-
Make sure that the file is available on all client VMs:
clush -w @clients sha256sum /mnt/glusterfs/test.txt
Result:
client01: 878fd15130e712e21fb35ec0978cb7194abc54a465848ff28356d10c0f79fdb4 /mnt/glusterfs/test.txt client02: 878fd15130e712e21fb35ec0978cb7194abc54a465848ff28356d10c0f79fdb4 /mnt/glusterfs/test.txt ...
Running the IOR benchmark test
While connected to client01
over SSH:
-
Install IOR:
-
Install dependencies on all client VMs in parallel:
clush -w @clients sudo apt-get install autoconf pkg-config libtool mpich make
-
Clone the IOR repository into the volume:
cd /mnt/glusterfs git clone https://github.com/hpc/ior.git
-
Create a directory to build IOR into:
cd ior mkdir prefix
-
Generate the IOR configuration script:
./bootstrap
-
Run the script:
./configure --disable-dependency-tracking --prefix /mnt/glusterfs/ior/prefix
-
Build and install IOR:
make make install
-
-
Set up your environment for the test:
-
Create a directory for the files that will be written and read during the test:
mkdir -p /mnt/glusterfs/benchmark/ior
-
Create an environment variable with a comma-delimited list of the client VMs' hostnames. For example, if your VMs are called
client01
toclient05
, their default hostnames areclient01.eu-north1.internal
toclient05.eu-north1.internal
, and the variable can be created like this:export GLUSTERFS_NODES=$(seq -f 'client%02g.eu-north1.internal' -s ',' 1 5)
-
-
Run the test:
mpirun -hosts $GLUSTERFS_NODES -ppn 16 \ /mnt/glusterfs/ior/prefix/bin/ior \ -o /mnt/glusterfs/benchmark/ior/ior_file -F \ -t 1m -b 16m -s 16 -C
The test writes and reads files in the GlusterFS volume with the following parameters:
-ppn 16
: 16 processes on each VM run the test in parallel.-o /mnt/glusterfs/benchmark/ior/ior_file -F
: Each process sequentially writes to its own file,/mnt/glusterfs/benchmark/ior/ior_file.<process_number>
; this is called "file-per-process" (-F
).-t 1m -b 16m
: Write and read operations are performed in 1 MiB (220 bytes) transfers and 16 MiB blocks.-s 16
: Each file consists of 16 one-block segments. For 5 client VMs, the total file size is 16 MiB × 16 segments × 16 processes × 5 VMs = 20 GiB.-C
: Instead of reading from its own file, each process reads from a file written by a neighboring client VM to bypass the page cache. This ensures that IOR measures the performance of the file system, not the VM's memory.
For more details, see the articles about the first steps with IOR
and its options in its documentation.
How to delete the resources you created
While the storage VMs are running, you are charged for them. If you do not need them anymore, delete them.
To delete the storage VMs, on your local machine, go to the directory with the Terraform configuration (see Set up a GlusterFS volume) and run the following command:
terraform destroy