Migrating data from Elasticsearch
There are two mechanisms to move data from a source Elasticsearch cluster to a target Managed Service for OpenSearch cluster:
-
Snapshots
This method is good for Elasticsearch cluster versions 7.11 or lower.
To learn more about snapshots, see the OpenSearch documentation
. -
Remote reindexing
(reindex data)You can use this mechanism to move your existing indices, aliases, or data streams. This method is good for all Elasticsearch clusters of version 7.
Migration using snapshots
To migrate data from a source cluster in Elasticsearch to a target cluster in Managed Service for OpenSearch using snapshots:
- Create a snapshot in the source cluster.
- Restore the snapshot in the target cluster.
- Complete your migration.
If you no longer need the resources you are using, delete them.
Getting started
Prepare the infrastructure
-
Create a Object Storage bucket with restricted access. This bucket will be used as a snapshot repository.
-
Create a service account and add it to the
editors
group. A service account is required to access the bucket from the source and target clusters. -
Create a static access key for the service account.
Warning
Save the key ID and secret key. You will need them in the next steps.
-
Create a target Managed Service for OpenSearch cluster in the required configuration with the following settings:
- Plugin:
repository-s3
. - Public access to a group of
DATA
nodes.
- Plugin:
Complete the configuration and check access to resources
-
Set up the Elasticsearch source cluster:
-
Install the plugin
repository-s3
on all cluster hosts. -
For the
repository-s3
plugin to work, restart the Elasticsearch and Kibana services on all cluster hosts. -
Make sure the Elasticsearch source cluster can access the internet.
-
-
Make sure you can connect to the target Managed Service for OpenSearch cluster using the OpenSearch API and Dashboards.
Create a snapshot on the source cluster
-
Connect the bucket as a snapshot repository on the source cluster:
-
Add the static access key information to the Elasticsearch keystore
(keystore).Note
Run the procedure on all hosts of the source cluster.
Add the following:
-
Key ID:
$ES_PATH/bin/elasticsearch-keystore add s3.client.default.access_key
-
Secret key:
$ES_PATH/bin/elasticsearch-keystore add s3.client.default.secret_key
Note
The path to Elasticsearch (
$ES_PATH
) depends on the selected installation method. To find a path to your Elasticsearch installation, see the installation documentation (for example, for DEB , RPM ). -
-
Upload the data from the keystore:
curl --request POST "https://<IP address or FQDN of the host with the DATA role in the source cluster>:9200/_nodes/reload_secure_settings"
-
Register the repository:
curl --request PUT \ "https://<IP address or FQDN of the host with the DATA role in the source cluster>:9200/_snapshot/<repository name>" \ --header 'Content-Type: application/json' \ --data '{ "type": "s3", "settings": { "bucket": "<bucket name>", "endpoint": "storage.ai.nebius.cloud" } }'
To learn more about adding the repository, see the plugin documentation
.Alert
If a bucket is registered in an Elasticsearch cluster as a snapshot repository, do not edit the bucket contents manually as this will disrupt the Elasticsearch snapshot mechanism.
-
-
Run the snapshot creation in the repository created in the previous step. You can create a snapshot of the entire cluster or some of the data. For more information, see the Elasticsearch documentation
.Example of creating a snapshot with the
snapshot_1
name for the entire cluster:curl --request PUT \ "https://<IP address or FQDN of the host with the DATA role in the source cluster>:9200/_snapshot/<repository name>/snapshot_1?wait_for_completion=false&pretty"
Creating a snapshot may take a long time. Track the operation progress using the Elasticsearch tools
, for example:curl --request GET \ "https://<IP address or FQDN of the host with the DATA role in the source cluster>:9200/_snapshot/<repository name>/snapshot_1/_status?pretty"
Restore a snapshot on the target cluster
-
Attach an Object Storage bucket to the target cluster. This bucket will be used as a read-only snapshot storage:
curl --request PUT \ "https://admin:<admin user password>@<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_snapshot/<repository name>" \ --cacert ~/.opensearch/root.crt \ --header 'Content-Type: application/json' \ --data '{ "type": "s3", "settings": { "bucket": "<bucket name>", "readonly" : "true", "endpoint": "storage.ai.nebius.cloud" } }'
-
Select how to restore an index on the target cluster.
With the default settings, an attempt to restore an index will fail in a cluster where the same-name index is already open. Even in Managed Service for OpenSearch clusters without user data, there are open system indices (such as
.apm-custom-link
or.kibana_*
), which may interfere with the restore operation. To avoid this, use one of the following methods:-
Migrate only your custom indices. The existing system indices are not migrated. The import process only affects the user-created indices on the source cluster.
-
Use the
rename_pattern
andrename_replacement
parameters. Indexes will be renamed as they are restored. To learn more, see the OpenSearch documentation .
Example of restoring the entire snapshot:
curl --request POST \ "https://admin:<admin password>@<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_snapshot/<repository name>/snapshot_1/_restore" \ --cacert ~/.opensearch/root.crt
-
-
Start restoring data from the snapshot on the target cluster.
Example of restoring a snapshot with indication of the user indices, which need to be restored on the target cluster:
curl --request POST \ "https://admin:<admin user password>@<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_snapshot/<repository name>/snapshot_1/_restore?wait_for_completion=false&pretty" \ --cacert ~/.opensearch/root.crt \ --header 'Content-Type: application/json' \ --data '{ "indices": "<list of indexes>" }'
Where
list of indices
is a list of comma-separated indices to be restored, for example,my_index*, my_index_2.*
.Restoring a snapshot may take a long time. To check the restoring status, run this command:
curl --request GET \ "https://admin:<admin password>@<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_snapshot/<repository name>/snapshot_1/_status?pretty" \ --cacert ~/.opensearch/root.crt
Complete your migration
-
Make sure all the indices you need have been transferred to the target Managed Service for OpenSearch cluster, and the number of documents in them is the same as in the source cluster:
BashOpenSearch DashboardsRun this command:
curl \ --user <username in the target cluster>:<user password in the target cluster> \ --cacert ~/.opensearch/root.crt \ --request GET 'https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_cat/indices?v'
The list should contain the indices transferred from Elasticsearch with the number of documents specified in the
docs.count
column.- Connect to the target cluster using OpenSearch Dashboards.
- Select the
Global
tenant. - Open the control panel by clicking
. - Under OpenSearch Plugins, select Index Management.
- Go to Indices.
The list should contain the indices transferred from Elasticsearch with the number of documents specified in the Total documents column.
-
Disable the snapshot repository
on the source and target clusters, if required.
Delete the resources you created
Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need:
- Delete the service account.
- Delete snapshots from the bucket and then delete the entire bucket.
- Delete the Managed Service for OpenSearch cluster.
Migration using reindexing
To migrate data from a source cluster in Elasticsearch to a target cluster in Managed Service for OpenSearch through reindexing:
If you no longer need the resources you created, delete them.
Getting started
-
Create a target Managed Service for OpenSearch cluster in the relevant configuration with public access to a group of nodes with the
DATA
role. -
Install an SSL certificate:
Linux (Bash) and macOS (Zsh)Windows (PowerShell)mkdir -p ~/.opensearch && \ wget "https://storage.nemax.nebius.cloud/certs/CA.pem" \ --output-document ~/.opensearch/root.crt && \ chmod 0600 ~/.opensearch/root.crt
The certificate is saved to the
~/.opensearch/root.crt
file.mkdir $HOME\.opensearch; curl -o $HOME\.opensearch\root.crt https://storage.nemax.nebius.cloud/certs/CA.pem
The certificate is saved to the
$HOME\.opensearch\root.crt
file. -
Make sure you can connect to the target Managed Service for OpenSearch cluster using the OpenSearch API and Dashboards.
-
Make sure the Elasticsearch source cluster can access the internet.
-
In the source cluster, create a user
with themonitoring_user
andviewer
roles .
Configure the target cluster
-
Create a role
with thecreate_index
andwrite
privileges for all indexes (*
). -
Create a user and assign the user this role.
Tip
In Managed Service for OpenSearch clusters, you can run re-indexing as the
admin
user assigned thesuperuser
role; however, it is more secure to create separate users with limited privileges for each job. For more information, see Managing OpenSearch users.
Start reindexing
-
Retrieve the list of nodes in the target cluster.
-
To start reindexing, run the request against the target cluster's node with the
DATA
role:curl --user <username in the target cluster>:<user password in the target cluster> \ --cacert ~/.opensearch/root.crt \ --request POST \ "https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_reindex?wait_for_completion=false&pretty" \ --header 'Content-Type: application/json' \ --data '{ "source": { "remote": { "host": "https://<IP address or FQDN of the node with the DATA role in the source cluster>:9200", "username": "<username in the source cluster>", "password": "<user password in the source cluster>" }, "index": "<name of the index, alias, or data stream in the source cluster>" }, "dest": { "index": "<name of the index, alias, or data stream in the target cluster>" } }'
Result:
{ "task" : "<ID of the reindexing job>" }
To transfer several indexes, use a
for
loop:for index in <names of indexes, aliases, or data streams separated by a space>; do curl --user <username in the target cluster>:<user password in the target cluster> \ --cacert ~/.opensearch/root.crt \ --request POST \ "https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_reindex?wait_for_completion=false&pretty" \ --header 'Content-Type: application/json' \ --data '{ "source": { "remote": { "host": "https://<IP address or FQDN of the node with the DATA role in the source cluster>:9200", "username": "<username in the source cluster>", "password": "<user password in the source cluster>" }, "index": "'$index'" }, "dest": { "index": "'$index'" } }' done
Result:
{ "task" : "<ID of reindexing job 1>" } { "task" : "<ID of reindexing job 2>" } ...
To learn more about reindexing parameters, see the OpenSearch documentation
.Reindexing may take a long time. To check the operation status, run this command:
curl --user <username in the target cluster>:<user password in the target cluster> \ --cacert ~/.opensearch/root.crt \ --request GET \ "https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_tasks/<ID of the reindexing job>"
-
To cancel reindexing, run this command:
curl --user <username in the target cluster>:<user password in the target cluster> \ --cacert ~/.opensearch/root.crt \ --request POST \ "https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_tasks/<ID of the re-indexing job>/_cancel"
Check the result
Make sure all the indices you need have been transferred to the target Managed Service for OpenSearch cluster, and the number of documents in them is the same as in the source cluster:
Run this command:
curl \
--user <username in the target cluster>:<user password in the target cluster> \
--cacert ~/.opensearch/root.crt \
--request GET 'https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_cat/indices?v'
The list should contain the indices transferred from Elasticsearch with the number of documents specified in the docs.count
column.
- Connect to the target cluster using OpenSearch Dashboards.
- Select the
Global
tenant. - Open the control panel by clicking
. - Under OpenSearch Plugins, select Index Management.
- Go to Indices.
The list should contain the indices transferred from Elasticsearch with the number of documents specified in the Total documents column.
Delete the resources you created
Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need: