Hello elasticsearch users/experts,
I have a bit of trouble understanding the race condition problem with the reindex api of Elasticsearch and would like to hear if anyone has found a solution about it.
I have searched a lot of places and could not find any clear solution (most of the solutions date back to before the reindex api).
As you might know, the (now) standard way of reindexing a document (after changing the mapping, for example) is to use an alias. Suppose the alias points to "old_index". We then create a new index called "new_index" with the new mapping, we call the reindex api to reindex the documents from 'old_index' to 'new_index' and then switch the alias to point to the new_index (and remove the alias pointer to old_index). It seems this is the standard way of reindexing and that is what I have seen on almost all recent websites I visited.
My questions are the following, for using this method, while I would not want downtime (so the user should still be able to search documents), and I would still want to be able to inject documents to ElasticSearch while the reindexing process is happening :
- If documents would still be incoming while the reindexing process is working (which would probably take a lot of time), how would the reindexing process ensure that the document would be ingested in the old index (to be able to search for it while the reindexing process is working) but still would be correctly reindexed to the new index?
- If a document is modified in the old index, after it has been reindexed (mapped to the new index), while the reindexing process is working, how would ElasticSearch ensure that this modification is also taken account in the new index?
- (Similar to 2.) If a record is deleted in the old index, after it has been reindexed (mapped to the new index), while the reindexing process is working, how would ElasticSearch ensure that this removal is also taken account in the new index?
Basically in a scenario where it is not affordable to make any indexing mistake for a document, how would one be able to proceed to make sure the reindexing goes without any of the above problems?
Has anyone any idea? And if there is no solution without any downtime, how would we proceed with the least amount of downtime in that case?
Thanks in advance!