2
votes

There is a problem with our mappings for elasticsearch 1.7. I am fixing the problem by creating a new index with the correct mappings. I understand that since I am creating a new index I will have to reindex from old index with existing data to the new index I have just created. Problem is I have googled around and can't find a way to reindex from old to new. Seems like the reindex API was introduced in ES 2.3 and not supported for 1.7.

My question is how do I reindex my data from old to new after fixing my mappings. Alternatively, what is the best practice for making mapping changes in ES 1.7?

  1. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html will not work for me because we're on an old version of ES (1.7)
  2. https://www.elastic.co/blog/changing-mapping-with-zero-downtime Initially went down that path but got stuck, need a way to reindex the old to the new
2
It would probably be a good investment to upgrade Elasticsearch. 5.x supports remote reindex, so you can pull your data from 1.7. Otherwise you're flogging a dead horse and any investment here (like a custom reindex strategy) is pretty much wasted.xeraa
Not always possible. We use something on top of Elastic which locks us to v1 of Elastic, so answers to this question are legitimately useful.Craig Brett

2 Answers

1
votes

Late for your use case, but wanted to put it out there for others. This is an excellent step-by-step guide on how to reindex an Elasticsearch index using Logstash version 1.5 while maintaining the integrity of the original data: http://david.pilato.fr/blog/2015/05/20/reindex-elasticsearch-with-logstash/

This is the logstash-simple.conf the author creates:

Input {
  # We read from the "old" cluster
  elasticsearch {
    hosts => [ "localhost" ]
    port => "9200"
    index => "index"
    size => 500
    scroll => "5m"
    docinfo => true
  }
}

filter {
  mutate {
    remove_field => [ "@timestamp", "@version" ]
  }
}

output {
  # We write to the "new" cluster
  elasticsearch {
    host => "localhost"
    port => "9200"
    protocol => "http"
    index => "new_index"
    index_type => "%{[@metadata][_type]}"
    document_id => "%{[@metadata][_id]}"
  }
  # We print dots to see it in action
  stdout {
    codec => "dots"
  }
0
votes

There are a few options for you:

use logstash - it's very easy to create a reindex config in logstash and use that to reindex your documents. for example:

input {
  elasticsearch {
    hosts => [ "localhost" ]
    port => "9200"
    index => "index1"
    size => 1000
    scroll => "5m"
    docinfo => true
  }
}


output {
  elasticsearch {
    host => "localhost"
    port => "9200"
    protocol => "http"
    index => "index2"
    index_type => "%{[@metadata][_type]}"
    document_id => "%{[@metadata][_id]}"
  }
}

The problem with this approach that it'll be relatively slow since you'll have only a single machine that peforms the reindexing process.

another option, use this tool. It'll be faster than logstash but you'll have to provide a segmentation logic for all your documents to speed up the processing. For example, if you have a numeric fields whose values range from 1 - 100, then you could segment the queries in the tool for, maybe, 10 intervals (1 - 10, 11 - 20, ... 91 - 100), so the tool will spawn up 10 indexers that will work in parallel reindexing your old index.