Recently we started to explore Solr partial index updates.
API for full and partial updates looks similar. Instead of
doc.addField("location", "UK")
solrClient.add(doc)
you have to write
doc.addField("location", map("set", "Germany"))
solrClient.add(doc)
What I expected to happen: solr will update inverted index for field "location"
What actually happens:
- solr loads stored fields for document
- applies given updates for document
- deletes document by id
- writes document to index
As the result, all non-stored fields are lost.
I found some old discussions in mailing lists, people say that this is expected behaviour, you need to make all fields stored and so on. We don't want to make all fields stored. "Stored" property was designed for fields that need to be returned in response from Solr to caller. We need only small meta-info in responses, making all fields stored looks like an overkill.
The question is - why solr/lucene performs all these steps to execute partial update? In my understanding, every field has its own inverted index located in its own file, so it should be possible to update fields independently. Judging by what really happens, solr/lucene is unable to update index for a single field and I can't find a reason for that.
Discussions on this topic: