1
votes

I'm using Elasticsearch 2.3.3 and trying to make an exact copy of an existing index. (using the reindex plugin bundled with Elasticsearch installation)

The problem is that the data is copied but settings such as the mapping and the analyzer are left out.

What is the best way to make an exact copy of an existing index, including all of its settings?

My main goal is to create a copy, change the copy and only if all went well switch an alias to the copy. (Zero downtime backup and restore)

3
The best way to achieve this would be to leverage index templates. Have you tried them out?Val
I'll give it a try - thank you.Boris Milner

3 Answers

3
votes

In my opinion, the best way to achieve this would be to leverage index templates. Index templates allow you to store a specification of your index, including settings (hence analyzers) and mappings. Then whenever you create a new index which matches your template, ES will create the index for you using the settings and mappings present in the template.

So, first create an index template called index_template with the template pattern myindex-*:

PUT /_template/index_template
{
  "template": "myindex-*",
  "settings": {
    ... your settings ...
  },
  "mappings": {
    "type1": {
      "properties": {
         ... your mapping ...
      }
    }
  }
}

What will happen next is that whenever you want to index a new document in any index whose name matches myindex-*, ES will use this template (+settings and mappings) to create the new index.

So say your current index is called myindex-1 and you want to reindex it into a new index called myindex-2. You'd send a reindex query like this one

POST /_reindex
{
  "source": {
    "index": "myindex-1"
  },
  "dest": {
    "index": "myindex-2"
  }
}

myindex-2 doesn't exist yet, but it will be created in the process using the settings and mappings of index_template because the name myindex-2 matches the myindex-* pattern.

Simple as that.

0
votes

The following seems to achieve exactly what I wanted:

Using Snapshot And Restore I was able to restore to a different index:

POST /_snapshot/index_backup/snapshot_1/_restore { "indices": "original_index", "ignore_unavailable": true, "include_global_state": false, "rename_pattern": "original_index", "rename_replacement": "replica_index" }

As far as I can currently tell, it has accomplished exactly what I needed. A 1-to-1 copy of my original index.

I also suspect this operation has better performance than re-indexing for my purposes.

0
votes

I'm facing the same issue when using the reindex API. Basically I'm merging daily, weekly, monthly indices to reduce shards.

We have a lot of indices with different data inputs, and maintaining a template for all cases is not an option. Thus we use dynamic mapping.

Due to dynamic mapping the reindex process can produce conflicts if your data is complicated, say json stored in a string field, and the reindexed field can end up as something else.

Sollution:

  1. Copy the mapping of your source index
  2. Create a new index, applying the mapping
  3. Disable dynamic mapping
  4. Start the reindex process.

A script can be created, and should of course have error checking in place. Abbreviated scripts below.

Create a new empty index with the mapping from an original index.:

#!/bin/bash
SRC=$1
DST=$2

# Create a temporary file for holding the SRC mapping
TMPF=$(mktemp)

# Extract the SRC mapping, use `jq` to get the first record
# write to TMPF
curl -f -s "${URL:?}/${SRC}/_mapping | jq -M -c 'first(.[])' > ${TMPF:?}

# Create the new index
curl -s -H 'Content-Type: application/json' -XPUT ${URL:?}/${DST} -d @${TMPF:?}

# Disable dynamic mapping
curl -s -H 'Content-Type: application/json' -XPUT \
    ${URL:?}/${DST}/_mapping -d '{ "dynamic": false }'

Start reindexing

curl -s -XPOST "${URL:?}" -H 'Content-Type: application/json' -d'
{
    "conflicts": "proceed",
    "source": {
        "index": "'${SRC}'"
    },
    "dest": {
        "index": "'${DST}'",
        "op_type": "create"
    }
}'