Migrating data from Elasticsearch

Migration using snapshots
Migration using reindexing

There are two mechanisms to move data from a source Elasticsearch cluster to a target Managed Service for OpenSearch cluster:

Snapshots

This method is good for Elasticsearch cluster versions 7.11 or lower.

To learn more about snapshots, see the OpenSearch documentation.
Remote reindexing (reindex data)

You can use this mechanism to move your existing indices, aliases, or data streams. This method is good for all Elasticsearch clusters of version 7.

Migration using snapshots

To migrate data from a source cluster in Elasticsearch to a target cluster in Managed Service for OpenSearch using snapshots:

If you no longer need the resources you are using, delete them.

Getting started

Prepare the infrastructure

Create a Object Storage bucket with restricted access. This bucket will be used as a snapshot repository.
Create a service account and add it to the editors group. A service account is required to access the bucket from the source and target clusters.
Create a static access key for the service account.

Warning

Save the key ID and secret key. You will need them in the next steps.
Create a target Managed Service for OpenSearch cluster in the required configuration with the following settings:
- Plugin: repository-s3.
- Public access to a group of DATA nodes.

Complete the configuration and check access to resources

Set up the Elasticsearch source cluster:
1. Install the plugin repository-s3 on all cluster hosts.
2. For the repository-s3 plugin to work, restart the Elasticsearch and Kibana services on all cluster hosts.
3. Make sure the Elasticsearch source cluster can access the internet.
Install an SSL certificate.
Make sure you can connect to the target Managed Service for OpenSearch cluster using the OpenSearch API and Dashboards.

Create a snapshot on the source cluster

Connect the bucket as a snapshot repository on the source cluster:
1. Add the static access key information to the Elasticsearch keystore (keystore).
  
  Note
  
  Run the procedure on all hosts of the source cluster.
  
  Add the following:
  - Key ID:
```
$ES_PATH/bin/elasticsearch-keystore add s3.client.default.access_key
```
  - Secret key:
```
$ES_PATH/bin/elasticsearch-keystore add s3.client.default.secret_key
```
  Note
  
  The path to Elasticsearch ($ES_PATH) depends on the selected installation method. To find a path to your Elasticsearch installation, see the installation documentation (for example, for DEB, RPM).
2. Upload the data from the keystore:
```
curl --request POST "https://<IP address or FQDN of the host with the DATA role in the source cluster>:9200/_nodes/reload_secure_settings"
```
3. Register the repository:
```
curl --request PUT \
     "https://<IP address or FQDN of the host with the DATA role in the source cluster>:9200/_snapshot/<repository name>" \
     --header 'Content-Type: application/json' \
     --data '{
       "type": "s3",
       "settings": {
         "bucket": "<bucket name>",
         "endpoint": "storage.ai.nebius.cloud"
       }
     }'
```
To learn more about adding the repository, see the plugin documentation.

Alert

If a bucket is registered in an Elasticsearch cluster as a snapshot repository, do not edit the bucket contents manually as this will disrupt the Elasticsearch snapshot mechanism.
Run the snapshot creation in the repository created in the previous step. You can create a snapshot of the entire cluster or some of the data. For more information, see the Elasticsearch documentation.

Example of creating a snapshot with the snapshot_1 name for the entire cluster:
```
curl --request PUT \
     "https://<IP address or FQDN of the host with the DATA role in the source cluster>:9200/_snapshot/<repository name>/snapshot_1?wait_for_completion=false&pretty"
```
Creating a snapshot may take a long time. Track the operation progress using the Elasticsearch tools, for example:
```
curl --request GET \
     "https://<IP address or FQDN of the host with the DATA role in the source cluster>:9200/_snapshot/<repository name>/snapshot_1/_status?pretty"
```

Restore a snapshot on the target cluster

Attach an Object Storage bucket to the target cluster. This bucket will be used as a read-only snapshot storage:

curl --request PUT \
     "https://admin:<admin user password>@<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_snapshot/<repository name>" \
     --cacert ~/.opensearch/root.crt \
     --header 'Content-Type: application/json' \
     --data '{
       "type": "s3",
       "settings": {
         "bucket": "<bucket name>",
         "readonly" : "true",
         "endpoint": "storage.ai.nebius.cloud"
       }
     }'

Select how to restore an index on the target cluster.

With the default settings, an attempt to restore an index will fail in a cluster where the same-name index is already open. Even in Managed Service for OpenSearch clusters without user data, there are open system indices (such as .apm-custom-link or .kibana_*), which may interfere with the restore operation. To avoid this, use one of the following methods:
- Migrate only your custom indices. The existing system indices are not migrated. The import process only affects the user-created indices on the source cluster.
- Use the rename_pattern and rename_replacement parameters. Indexes will be renamed as they are restored. To learn more, see the OpenSearch documentation.
Example of restoring the entire snapshot:
```
curl --request POST \
     "https://admin:<admin password>@<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_snapshot/<repository name>/snapshot_1/_restore" \
     --cacert ~/.opensearch/root.crt
```

Start restoring data from the snapshot on the target cluster.

Example of restoring a snapshot with indication of the user indices, which need to be restored on the target cluster:

curl --request POST \
     "https://admin:<admin user password>@<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_snapshot/<repository name>/snapshot_1/_restore?wait_for_completion=false&pretty" \
     --cacert ~/.opensearch/root.crt \
     --header 'Content-Type: application/json' \
     --data '{
       "indices": "<list of indexes>"
     }'

Where list of indices is a list of comma-separated indices to be restored, for example, my_index*, my_index_2.*.

Restoring a snapshot may take a long time. To check the restoring status, run this command:

curl --request GET \
     "https://admin:<admin password>@<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_snapshot/<repository name>/snapshot_1/_status?pretty" \
     --cacert ~/.opensearch/root.crt

Complete your migration

Make sure all the indices you need have been transferred to the target Managed Service for OpenSearch cluster, and the number of documents in them is the same as in the source cluster:
Bash

OpenSearch Dashboards
Run this command:
```
curl \
    --user <username in the target cluster>:<user password in the target cluster> \
    --cacert ~/.opensearch/root.crt \
    --request GET 'https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_cat/indices?v'
```
The list should contain the indices transferred from Elasticsearch with the number of documents specified in the docs.count column.
1. Connect to the target cluster using OpenSearch Dashboards.
2. Select the Global tenant.
3. Open the control panel by clicking .
4. Under OpenSearch Plugins, select Index Management.
5. Go to Indices.
The list should contain the indices transferred from Elasticsearch with the number of documents specified in the Total documents column.
Disable the snapshot repository on the source and target clusters, if required.

Delete the resources you created

Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need:

Delete the service account.
Delete snapshots from the bucket and then delete the entire bucket.
Delete the Managed Service for OpenSearch cluster.

Migration using reindexing

To migrate data from a source cluster in Elasticsearch to a target cluster in Managed Service for OpenSearch through reindexing:

If you no longer need the resources you created, delete them.

Getting started

Create a target Managed Service for OpenSearch cluster in the relevant configuration with public access to a group of nodes with the DATA role.

Install an SSL certificate:

Linux (Bash) and macOS (Zsh)

Windows (PowerShell)

mkdir -p ~/.opensearch && \
wget "https://storage.nemax.nebius.cloud/certs/CA.pem" \
     --output-document ~/.opensearch/root.crt && \
chmod 0600 ~/.opensearch/root.crt

The certificate is saved to the ~/.opensearch/root.crt file.

mkdir $HOME\.opensearch; curl -o $HOME\.opensearch\root.crt https://storage.nemax.nebius.cloud/certs/CA.pem

The certificate is saved to the $HOME\.opensearch\root.crt file.

Make sure you can connect to the target Managed Service for OpenSearch cluster using the OpenSearch API and Dashboards.
Make sure the Elasticsearch source cluster can access the internet.
In the source cluster, create a user with the monitoring_user and viewer roles.

Configure the target cluster

Create a role with the create_index and write privileges for all indexes (*).
Create a user and assign the user this role.

Tip

In Managed Service for OpenSearch clusters, you can run re-indexing as the admin user assigned the superuser role; however, it is more secure to create separate users with limited privileges for each job. For more information, see Managing OpenSearch users.

Start reindexing

Retrieve the list of nodes in the target cluster.

To start reindexing, run the request against the target cluster's node with the DATA role:

curl --user <username in the target cluster>:<user password in the target cluster> \
     --cacert ~/.opensearch/root.crt \
     --request POST \
     "https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_reindex?wait_for_completion=false&pretty" \
     --header 'Content-Type: application/json' \
     --data '{
       "source": {
         "remote": {
           "host": "https://<IP address or FQDN of the node with the DATA role in the source cluster>:9200",
           "username": "<username in the source cluster>",
           "password": "<user password in the source cluster>"
         },
         "index": "<name of the index, alias, or data stream in the source cluster>"
       },
       "dest": {
         "index": "<name of the index, alias, or data stream in the target cluster>"
       }
     }'

Result:

{
  "task" : "<ID of the reindexing job>"
}

To transfer several indexes, use a for loop:

for index in <names of indexes, aliases, or data streams separated by a space>; do
  curl --user <username in the target cluster>:<user password in the target cluster> \
       --cacert ~/.opensearch/root.crt \
       --request POST \
       "https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_reindex?wait_for_completion=false&pretty" \
       --header 'Content-Type: application/json' \
       --data '{
         "source": {
           "remote": {
             "host": "https://<IP address or FQDN of the node with the DATA role in the source cluster>:9200",
             "username": "<username in the source cluster>",
             "password": "<user password in the source cluster>"
           },
           "index": "'$index'"
         },
         "dest": {
           "index": "'$index'"
         }
       }'
done

Result:

{
  "task" : "<ID of reindexing job 1>"
}
{
  "task" : "<ID of reindexing job 2>"
}
...

To learn more about reindexing parameters, see the OpenSearch documentation.

Reindexing may take a long time. To check the operation status, run this command:

curl --user <username in the target cluster>:<user password in the target cluster> \
     --cacert ~/.opensearch/root.crt \
     --request GET \
     "https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_tasks/<ID of the reindexing job>"

To cancel reindexing, run this command:

curl --user <username in the target cluster>:<user password in the target cluster> \
     --cacert ~/.opensearch/root.crt \
     --request POST \
     "https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_tasks/<ID of the re-indexing job>/_cancel"

Check the result

Make sure all the indices you need have been transferred to the target Managed Service for OpenSearch cluster, and the number of documents in them is the same as in the source cluster:

Bash

OpenSearch Dashboards

Run this command:

curl \
    --user <username in the target cluster>:<user password in the target cluster> \
    --cacert ~/.opensearch/root.crt \
    --request GET 'https://<ID of the OpenSearch node with the DATA role>.mdb.nemax.nebius.cloud:9200/_cat/indices?v'

The list should contain the indices transferred from Elasticsearch with the number of documents specified in the docs.count column.

Connect to the target cluster using OpenSearch Dashboards.
Select the Global tenant.
Open the control panel by clicking .
Under OpenSearch Plugins, select Index Management.
Go to Indices.

The list should contain the indices transferred from Elasticsearch with the number of documents specified in the Total documents column.

Delete the resources you created

Some resources are not free of charge. To avoid paying for them, delete the resources you no longer need:

Delete the Managed Service for OpenSearch cluster.