3
votes

I have 9000 documents in my ElasticSearch index.

I want to sort by an analyzed string field, so, in order to do that i knew ( through Google ) that i must update the mapping to make the field not-analyzed so i can sort by this field and i must re-index the data again to reflect the change in mapping.

The re-indexing process consumed about 20 minutes on my machine.

The strange thing is that the re-indexing process consumed about 2 hours on a very powerful production server.

I checked the memory status and the processor usage on that server and everything was normal.

What i want to know is:

  1. Is there a way to sort documents by an analyzed, tokenized field without re-indexing the whole documents?

  2. If i must re-index the whole documents, then why does it take such huge time to re-index the documents on the server ?? or how to trace the slowness reason on that server?

1
As to 1. Any change to the mapping requires a reindex. However, it' perfectly valid (and done lots and lots of times) to have an analyzed field (for displaying / searching purposes or whatever) and a non-analyzed field (containing the same data) for sorting living side by side. Perhaps this answers your implicit usecase that I seem to infer from your question.Geert-Jan
Having the same data map to multiple fields (as I suggest above) can be done easily with elasticsearch.org/guide/reference/mapping/multi-field-type.html, which doesn't require a change to your client-codeGeert-Jan

1 Answers

0
votes

As long as the field is stored in _source, I'm pretty sure you could use a script to create a custom fields everytime you search.

{
  "query" : { "query_string" : {"query" : "*:*"} },
  "sort" : {
    "_script" : { 
        "script" : "<some sorting field>",
        "type" : "number",
        "params" : {},
        "order" : "asc"
    }
  }
}

This has the downside of re-evaluating the sorting script on the server side each time you search, but I thing it solves (1).