1
votes

Background:

  • We have two AWS ElasticSearch clusters on version 6.8 in same AWS account and region.
  • We need to reindex one of the index from cluster 1 (source) to cluster 2 (dest).

I tried to use reindex API for 6.8 as described in documentation of ES

POST <https://endpoint of destination Elasticsearch>/_reindex 

    {
      "source": {
        "remote": {
          "host": "https://endpoint-of-source-elasticsearch-cluster-1.es.amazonaws.com"
        },
        "index": "source-index-name"
      },
      "dest": {
        "index": "destination-index-name"
      }
    }

Problem:

I'm getting below error

{
    "error": {
        "root_cause": [
            {
                "type": "x_content_parse_exception",
                "reason": "[8:3] [reindex] failed to parse field [source]"
            }
        ],
        "type": "x_content_parse_exception",
        "reason": "[8:3] [reindex] failed to parse field [source]",
        "caused_by": {
            "type": "illegal_argument_exception",
            "reason": "[host] must be of the form [scheme]://[host]:[port](/[pathPrefix])? but was [https://endpoint-of-source-elasticsearch-cluster-1.es.amazonaws.com]",
            "caused_by": {
                "type": "u_r_i_syntax_exception",
                "reason": "The port was not defined in the [host]: https://endpoint-of-source-elasticsearch-cluster-1.es.amazonaws.com"
            }
        }
    },
    "status": 400
}

Probable cause:

  1. The host parameter must contain a scheme, host, port (e.g. https://otherhost:9200) as per doc.
  2. Remote hosts have to be explicitly whitelisted in elasticsearch.yaml using the reindex.remote.whitelist property

Since I'm using AWS clusters I'm not sure how to follow the scheme of host,post or how to whitelist cluster, because I don't know how to do these changes on AWS cluster.

Request to help, If any workaround available. Thanks,

3

3 Answers

6
votes

Unfortunately, in AWS managed Elasticsearch you will not be able to modify static configuration settings like the reindex.remote.whitelist parameter, because to configure the reindex.remote.whitelist parameter requires modification of the elasticsearch.yml file. This is because AWS ES managed service and currently there isn't a way for customers to access the OS/File System.

As alternatives,

  1. You can take a manual snapshot of your previous domain and restore it the new domain. Compared to reindex from remote, this approach affects only one domain at a time i.e the one from which the snapshot is being taken from or the domain to where the snapshot is being restored to.

  2. You can also use Logstash with the Elasticsearch input and output plugins to essentially read data from the index in your original domain and index it into any other/index domain.

2
votes

AWS Elasticsearch v7.9 now supports remote reindex, all you need to do is to issue a reindex command, for example:

POST <local-domain-endpoint>/_reindex
{
  "source": {
    "remote": {
      "host": "https://remote-domain-endpoint:443"
    },
    "index": "remote_index"
  },
  "dest": {
    "index": "local_index"
  }
}

You must add 443 at the end of the remote domain endpoint for a validation check.

To verify that the index is copied over to the local domain:

GET <local-domain-endpoint>/local_index/_search

If the remote index is in a region different from your local domain, pass in its region name, such as in this sample request:

POST <local-domain-endpoint>/_reindex
{
  "source": {
    "remote": {
      "host": "https://remote-domain-endpoint:443",
      "region": "eu-west-1"
    },
    "index": "test_index"
  },
  "dest": {
    "index": "local_index"
  }
}

Note: 1- You must include the port in the source address

2- With AWS Elasticsearch, you no longer need to whitelist the source IP/Address as with Standard Elasticsearch, AWS Elasticsearch assumes that by issuing this command the source address is trusted.

Elasticsearch AWS documentation is here for reference: https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/remote-reindex.html

1
votes

Never managed to use the remote reindex feature with aws es, it just doesn't work.

The good old elasticdump never fails:

elasticdump \
  --input=https://xxxx.eu-west-1.es.amazonaws.com:443/index-name \
  --output=https://xxxx.eu-west-1.es.amazonaws.com:443/index-name \
  --type=data \
  --limit=500 \
  --concurrency=5

To install without sudo:

# Install node without root
curl -s https://webinstall.dev/node | bash

# Install elasticdump
npm install elasticdump -g