I have a core with millions of records.
I want to add a custom handler which scan the existing documents and update one of the field based on a condition (age>12 for example).
I prefer doing it on the Solr server side for avoiding sending millions of documents to the client and back.
I was thinking of writing a solr plugin which will receive a query and update some fields on the query documents (like the delete by query handler).
I was wondering whether there are existing solutions or better alternatives.
I was searching the web for a while and couldn't find examples of Solr plugins which update documents (I don't need to extend the update handler).
I've written a plug-in which use the following code which works fine but isn't as fast as I need.
Currently I do:
AddUpdateCommand addUpdateCommand = new AddUpdateCommand(solrQueryRequest);
DocIterator iterator = docList.iterator();
SolrIndexSearcher indexReader = solrQueryRequest.getSearcher();
while (iterator.hasNext()) {
Document document = indexReader.doc(iterator.nextDoc());
SolrInputDocument solrInputDocument = new SolrInputDocument();
addUpdateCommand.clear();
addUpdateCommand.solrDoc = solrInputDocument;
addUpdateCommand.solrDoc.setField("id", document.get("id"));
addUpdateCommand.solrDoc.setField("my_updated_field", new_value);
updateRequestProcessor.processAdd(addUpdateCommand);
}
But this is very expensive since the update handler will fetch again the document which I already hold at hand.
Is there a safe way to update the lucene document and write it back while taking into account all the Solr related code such as caches, extra solr logic, etc?
I was thinking of converting it to a SolrInputDocument and then just add the document through Solr but I need first to convert all fields.
Thanks in advance,
Avner