4
votes

I've stucked with one question in my understanding of ElasticSearch indexing process. I've already read this article, which says, that inverted-index stores all tokens of all documents and it is immutable. So, to update it we must remove it and reindexing all data to have all document searchable.

But I've read about partial updating the documents (automaticaly marking them to "deleted" and inserting+indexing new one). But in those article where no mention about reindexing all previous data.

So, I do not understand properly next: when I update the document (text document with 100 000 words) and already have in storage some other indexed document - is it true that I will have on every UPDATE or INSERT operation reindexing process of all my documents?

Basicly I rely on default ElasticSearch settings (5 primary shards with one replica per shard and 2 nodes in cluster)

1
and I also suggest not to ask more then one question in a postbpgergo
Thanks, I've updated my questionmaret

1 Answers

1
votes

You can just have a document updated (that is reindexed, which is basically the same as removing from index and adding it again), see: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/update-doc.html This will take care of the whole index, so you won't need to reindex every other document.

I'm not sure what you mean by "save" operation, you may want to clarify it with an example.

As of the time required to update a document of 100K words, I suggest you to try it out.