0
votes

I have a solr core with 100K-1000k documents.
I have a scenario where I need to add or set a field value on most document.
Doing it through Solr takes too much time.
I was wondering if there is a way to do such task with Lucene library and access the Solr index directly (with less overhead).
If needed, I can shutdown the core, run my code and reload the core afterwards (hoping it will take less time than doing it with Solr).
It will be great to hear if someone already done such a thing and what are the major pitfalls in the way.

2

2 Answers

0
votes

Similar problem has been discussed multiple times in Lucene Java mailing list. The underlying problem is that you can not update document in Lucene (and hence Solr).

Instead, you need to delete the document and insert a new one. This obviously adds overhead of analyzing, merging index segments, etc. Yet, the specified amount of documents isn't something major and should not take days (have you tried updating Solr with multiple threads?).

You can of course try doing this via Lucene and see if this makes any difference, but you need to be absolutely sure you will be using the same analyzers as Solr does.

0
votes

I have a scenario where I need to add or set a field value on most document.

If you have to do it often, maybe you need to look at things like ExternalFileField. There are limitations, but it may be better than hacking around Solr's infrastructure by going directly to Lucene.