Migrating databases from a third-party PostgreSQL cluster to Managed Service for PostgreSQL
- Migrating data using logical replication
- Getting started
- Set up the source cluster
- Export the database schema in the source cluster
- Restore the database schema in the target cluster
- Create a publication and subscription
- Migrate PostgreSQL sequences after replication
- Delete the subscription and switch over the load
- Delete the resources you created
- Transferring data by creating and restoring a logical dump
Note
Data migration to a third-party PostgreSQL cluster is described in Migrating databases from Managed Service for PostgreSQL.
You can migrate data from a third-party source cluster to a Managed Service for PostgreSQL target cluster in the following ways:
-
Migrating data using logical replication.
Logical replication
uses the subscriptions mechanism. This lets you migrate data to the target cluster with minimal downtime. -
Transferring data by creating and restoring a logical dump.
A logical dump is a file with a set of commands running which one by one you can restore the state of a database. It is created using the
pg_dump
utility. To ensure that a logical dump is complete, before creating it, switch the source cluster toread-only
mode.Use this method only if, for some reason, it's not possible to transfer data using other methods.
Migrating data using logical replication
Logical replication is supported as of PostgreSQL version 10. Besides migrating data between the same PostgreSQL versions, logical replication lets you migrate to newer PostgreSQL versions.
In Managed Service for PostgreSQL clusters, subscriptions can be used by the database owner (a user created together with the cluster) and users with the mdb_admin
role for the cluster.
Migration stages:
- Set up the source cluster.
- Export the database schema in the source cluster.
- Restore the database schema in the target cluster.
- Create a PostgreSQL publication and subscription.
- Migrate the PostgreSQL sequence after replication.
- Disable replication and transfer the load.
If you no longer need the resources you created, delete them.
Getting started
Create the required resources:
Create a Managed Service for PostgreSQL target cluster with any suitable configuration. In this case:
- The PostgreSQL version must be the same or higher than the version in the source cluster. You cannot perform migration while downgrading PostgreSQL version.
- When creating a cluster, specify the same database name as in the source cluster.
- Enable the same PostgreSQL extensions as in the source cluster.
Set up the source cluster
-
Specify the required SSL and WAL settings in the
postgresql.conf
file. On Debian and Ubuntu Linux distributions, the default path to this file is/etc/postgresql/<PostgreSQL version>/main/postgresql.conf
.-
We recommend using SSL for migrating data: this will help not only encrypt data, but also compress it. For more information, see SSL Support
and Database Connection Control Functions in the PostgreSQL documentation.To enable SSL, set the appropriate value in the configuration:
ssl = on # on, off
-
Change the logging level for Write Ahead Log (WAL)
to add the information needed for logical replication. To do this, set the value of the wal_level tological
.The setting can be changed in
postgresql.conf
. Find the line with thewal_level
setting, comment it out, if required, and set the value tological
:wal_level = logical # minimal, replica, or logical
-
-
Configure authentication of hosts in the source cluster. To do this, add the cluster hosts to the
pg_hba.conf
file on Nebius AI (on Debian and Ubuntu distributions, it is located at the path/etc/postgresql/<PostgreSQL version>/main/pg_hba.conf by default
).Add lines to allow connecting to the database from the specified hosts:
-
If you use SSL:
hostssl all all <host address> md5 hostssl replication all <host address> md5
-
If you do not use SSL:
host all all <host address> md5 host replication all <host address> md5
-
-
If a firewall is enabled in the source cluster, allow incoming connections from the Managed Service for PostgreSQL cluster hosts. For example, for Ubuntu 18:
sudo ufw allow from <target cluster host address> to any port <port>
-
Restart the PostgreSQL service to apply all your settings:
sudo systemctl restart postgresql
-
Check the PostgreSQL status after restarting:
sudo systemctl status postgresql
Export the database schema in the source cluster
Use the pg_dump
utility to create a file with the database schema to be applied in the target cluster.
pg_dump -h <IP address or FQDN of source cluster's master host> \
-U <username> \
-p <port> \
--schema-only \
--no-privileges \
--no-subscriptions \
-d <DB name> \
-Fd -f /tmp/db_dump
This export command skips all data associated with privileges and roles to avoid conflicts with the database settings in Nebius AI. If your database requires additional users, create them.
Restore the database schema in the target cluster
Using the pg_restore
utility, restore the database schema in the target cluster:
pg_restore -h <IP address or FQDN of target cluster's master host> \
-U <username> \
-p 6432 \
-Fd -v \
--single-transaction \
-s --no-privileges \
-d <DB name> /tmp/db_dump
Create a publication and subscription
For logical replication to work, create a publication (a group of logically replicated tables) in the source cluster and a subscription (a description of connection to another database) on the target cluster.
-
On the source cluster, create a publication for all the database tables. When migrating multiple databases, you need to create a separate publication for each of them.
Note
You need superuser rights to create publications to all tables, but not to transfer the selected tables. For more information about creating publications, see the PostgreSQL documentation
.Query:
CREATE PUBLICATION p_data_migration FOR ALL TABLES;
-
On the Managed Service for PostgreSQL cluster host, create a subscription with a connection string to the publication. For more information about creating subscriptions, see the PostgreSQL documentation
.Request with SSL enabled:
CREATE SUBSCRIPTION s_data_migration CONNECTION 'host=<source cluster address> port=<port> user=<username> sslmode=verify-full dbname=<database name>' PUBLICATION p_data_migration;
Without SSL:
CREATE SUBSCRIPTION s_data_migration CONNECTION 'host=<source cluster address> port=<port> user=<username> sslmode=disable dbname=<database name>' PUBLICATION p_data_migration;
-
To get the replication status, check the
pg_subscription_rel
directories. You can get the general replication status in the target cluster usingpg_stat_subscription
and in the source cluster usingpg_stat_replication
.SELECT * FROM pg_subscription_rel;
First of all, check the
srsubstate
field.R
in this field means that synchronization has ended and the databases are ready to be replicated.
Migrate PostgreSQL sequences after replication
To complete synchronization of the source cluster and the target cluster:
-
Switch the source cluster to
read-only
mode. -
Create a dump with PostgreSQL-sequences in the source cluster:
pg_dump -h <IP address or FQDN of source cluster's master host> \ -U <username> \ -p <port> \ -d <DB name> \ --data-only -t '*.*_seq' > /tmp/seq-data.sql
Take note of the
*.*_seq
pattern used. If the database you're migrating has sequences that don't match this pattern, enter a different pattern to export them.For more information about patterns, see the PostgreSQL documentation
. -
Restore the dump with sequences in the target cluster:
psql -h <IP address or FQDN of target cluster's master host> \ -U <username> \ -p 6432 \ -d <DB name> \ < /tmp/seq-data.sql
Delete the subscription and switch over the load
-
Delete the subscription in the target cluster:
DROP SUBSCRIPTION s_data_migration;
-
Switch over the load to the target cluster.
Delete the resources you created
Delete the resources you no longer need to avoid being charged for them:
Transferring data by creating and restoring a logical dump
Create a dump of the database you need in the source cluster using the pg_dump
utility. To restore the dump in the target cluster, use the pg_restore
utility.
Note
To use pg_restore
, you might need the pg_repack
database extension.
Migration stages:
- Create a dump of the database you want to migrate.
- (Optional) Create a virtual machine in Nebius AI and upload the DB dump to it.
- Restore data from the dump to the target cluster.
If you no longer need the resources you created, delete them.
Getting started
Create the required resources:
-
Create a Managed Service for PostgreSQL target cluster with any suitable configuration. The following parameters must be the same as in the source cluster:
-
Version PostgreSQL.
-
Username.
Note
You may use different usernames for the source and the target. This, however, may result in an error when restoring from dump. For more information, see Moving and restoring a PostgreSQL cluster.
-
-
(Optional step) Create a VM based on Ubuntu 20.04 LTS with the following parameters:
-
Disks and file storage → Size: Sufficient to store both archived and unarchived dumps.
The recommended size is two or more times the total dump and dump archive size.
-
Network settings:
- Public address: Select
Auto
or one address from a list of reserved IPs.
- Public address: Select
-
Create a database dump
-
Switch the database to the
read-only
mode. -
Create a dump using the pg_dump
utility. To speed up the process, run it in multithreaded mode by providing the number of available CPU cores in the--jobs
argument:pg_dump --host=<IP address of FQDN of the source cluster master host> \ --port=<port> \ --username=<username> \ --jobs=<number of CPU cores> \ --format=d \ --dbname=<database name> \ --file=db_dump
(Optional) Upload the dump to a virtual machine in Nebius AI
Transfer your data to an intermediate VM in Compute Cloud if:
- Your Managed Service for PostgreSQL cluster is not accessible from the internet.
- Your hardware or connection to the cluster in Nebius AI is not very reliable.
The required amount of RAM and processor cores depends on the amount of data to migrate and the required migration speed.
To prepare the virtual machine to restore the dump:
-
In the management console, create a new VM from an Ubuntu 20.04 image on Marketplace. The VM parameters depend on the size of the database you want to migrate. The minimum configuration (1 core, 2 GB RAM, 10 GB disk space) should be sufficient to migrate a database that's up to 1 GB in size. The bigger the database being migrated, the more RAM and storage space you need (at least twice as large as the size of the database).
The virtual machine must be in the same network and availability zone as the PostgreSQL cluster. Additionally, the VM must be assigned a public IP address so that you can load the dump from outside Nebius AI.
-
Set up the PostgreSQL apt repository
. -
Install the PostgreSQL client and additional utilities for working with the DBMS:
sudo apt install postgresql-client-common # For PostgreSQL 10 sudo apt install postgresql-client-10 # For PostgreSQL 11 sudo apt install postgresql-client-11 # For PostgreSQL 12 sudo apt install postgresql-client-12 # For PostgreSQL 13 sudo apt install postgresql-client-13 # For PostgreSQL 14 sudo apt install postgresql-client-14
-
Archive the dump:
tar -cvzf db_dump.tar.gz db_dump
-
Move the archive containing the dump to the VM, e.g., by the
scp
utility:scp db_dump.tar.gz <VM username>@<VM public address>:/db_dump.tar.gz
-
Unpack the archive with the dump:
tar -xzf db_dump.tar.gz
Restore data from the dump to the target cluster
Restore the database dump using the pg_restore
The version of pg_restore
must match the pg_dump
version, and the major version must be at least as high as on the DB where the dump is deployed.
That is, to restore a dump of PostgreSQL 10, PostgreSQL 11, PostgreSQL 12, PostgreSQL 13, and PostgreSQL 14 use pg_restore 10
, pg_restore 11
, pg_restore 12
, pg_restore 13
, and pg_restore 14
, respectively.
pg_restore --host=<IP address of FQDN of the target cluster master host> \
--username=<username> \
--dbname=<database name> \
--port=6432 \
--format=d \
--verbose \
db_dump \
--single-transaction \
--no-privileges
If you only need to restore a single schema, add the --schema=<schema name>
parameter. Without this parameter, the command will only run on behalf of the database owner.
If the restoring is interrupted by errors saying that required rights for creating and updating extensions are missing, remove the --single-transaction
parameter from the command. The errors will be ignored in this case:
pg_restore: warning: errors ignored on restore: 3
Make sure the errors only apply to the extensions and check the integrity of your restored data.
Delete the resources you created
Delete the resources you no longer need to avoid being charged for them:
- Delete the Managed Service for PostgreSQL cluster.
- If you created an intermediate virtual machine, delete it.